search


Transformation of Data
Fri Sep 14 00:00:00 UTC 2018

What is functional programming? The literature espouse first class function values, mathematical purity of functions, closures, immutability, monads, functors or combinators. However, these are just secondary supporting tools/ideas. The essence of functional programming ironically is not about functions but about data and the use of functions to transform data. Every programming language has functions but not all languages are functional programming languages. What makes a language a functional programming language is that it encourages designing programs in a functional style. One of the core ideas of functional programming is the transformation of data. Contrary to the name, programming in a functional style is more about the data than the functions. Programming in Clojure, or other FP languages, is all about applying functions to the transformation of data. A program is a collection of functions that transforms data. As a dialect of LISP, Clojure takes this concept to the extreme. A program written in Clojure is itself a data structure in the form of EDN s-expressions that you manipulate and transform. This is why you need a good structural editor, like emacs with paredit, to program in Clojure

Transformation of Data Structures

When designing a program, we start with the data structures needed to model the domain problem. Next, we write functions to transform the data structures. There are two main categories of data transformation: mapping and folding

Mapping

Mapping is applying a function to every element of a collection producing a new collection. Mapping perserves the structure of the original collection. For example: (def numbers [1 2 3 4 5]) ;;map is a function that takes two arguments: first argument is a function, the second argument is a collection like a list, vector, set or map (def incremented-numbers (map inc numbers)) (def doubled-numbers (map (fn [n] (* 2 n)) numbers)) (println "numbers -> incremented-numbers=" numbers "->" incremented-numbers) (println "numbers -> doubled-numbers=" numbers "->" doubled-numbers) (println "type of numbers=" (type numbers)) (println "type of doubled-numbers=" (type doubled-numbers)) ;;mapping preserves structure of the original data ;;notice incremented-numbers and doubled-numbers contains the same number of elements as numbers ;;numbers is a vector but map returns a lazy sequence. Use mapv if you want the result to be a vector ;;instead of a lazy sequence (def doubled-numbers-as-vector (mapv (fn [n] (* 2 n)) numbers)) (println "doubled-numbers-as-vector=" doubled-numbers-as-vector) (println "type of doubled-numbers-as-vector=" (type doubled-numbers-as-vector))

List Comprehension

In non-functional languages, the key word 'for' is used as a loop, but in Clojure, 'for' is a List Comprehension . It is based on mathematical way of defining sets by defining its properties instead of enumerating the set, which can be infinite, directly. This idea is called Set Comprehension The 'for' list comprehension can be used to implement map. We've seen this used in our pair programming sessions. ;;for can be used to implement map (defn my-map [f a-collection] (for [element a-collection] (f element))) (def numbers [1 2 3 4 5]) (def incremented-numbers (my-map inc numbers)) (def doubled-numbers (my-map (fn [n] (* 2 n)) numbers)) (println "numbers=" numbers) (println "incremented-numbers=" incremented-numbers) (println "doubled-numbers=" doubled-numbers) (println "type of numbers=" (type numbers)) (println "type of doubled-numbers=" (type doubled-numbers))

Folding

Mapping preserves the structure of the original data, but folding can change the structure of the original data. Folding applies a function that combines data in the collection resulting in a single value. To understand what fold is and why it is called fold, the following diagrams will help.

The list (1 2 3 4 5) is shown as a tree and not as a flat linear structure because lists are constructed using a function called cons . In most LISP implementations, a list is a chain of cons cells with the last element ending with a nil value. Clojure departs from this tradition but let's pretend this is not the case because fold is defined using lists constructed with cons cells. The diagram above is stolen from haskell documentation and the haskell function for cons is represented by colon ':'. Folding is simply replacing the cons function with an arbitary combining function f and replacing the nil value with an arbitary value z. The function f must take two parameters. There are two ways of evaluating the combinding function f. One way is with the values of the list on the left side of the function f (right fold). The other way is with the values of the list on the right side of the function f (left fold). Don't confuse the direction of the list and the fold. The direction of the fold refers to the order of expression evaluation and not the direction of the list. Let's convert the diagram above into Clojure code to get a better understanding of right and left fold.

(def a '( 1 2 3 4 5)) ;;this is a list (def b (cons 1 (cons 2 (cons 3 (cons 4 (cons 5 nil)))))) ;;this is also a list in traditional LISP ;;In traditional LISP, a list is chain of cons cells with the last pair in the chain ending with nil ;;This is not true in Clojure but let's pretend it is because it makes it easier to explain fold. ;;folding is simply replacing the cons function with a combinding function f and the nil with an arbitary value z. ;;let's replace cons with + and the nil with z (def z 0) ;; let's give z a value of 0. ;;notice the right most expression is evaluated first ;;and folds/collapse on the right, thus the name right fold (def right-fold-b (+ 1 (+ 2 (+ 3 (+ 4 (+ 5 z)))))) ;;notice the left most expression is evaluated first ;;and folds/collapse on the left, thus the name left fold (def left-fold-b (+ (+ (+ (+ (+ z 1) 2) 3) 4) 5)) ;;folding the list b with the + function sums all the numbers in b ;;since combinding function + is associative meaning x+y is the same as y+x ;;right fold and left fold produces the same result (println "right-fold-b=" right-fold-b) (println "left-fold-b=" left-fold-b)
Reducing

Left fold is also known as "reduce" in Clojure. Other languages like SmallTalk/Groovy/Ruby call left fold "inject"

(def b [1 2 3 4 5]) ;;this is a vector and not cons cells but we can still left fold (or reduce it) (reduce + b)
Map Reduce and Big Data

Google published a paper MapReduce: Simplfied Data Processing on Large Clusters that has inspired Big Data processing systems like Hadoop and Spark. It is obvious this idea is inspired directly from LISP map and reduce functions. It is worth mentioning that the combinding function f can execute in parrallel to take advantage of multi-core processors. Clojure has a parrallel version of map called pmap that runs the mapping function f on multiple threads taking advantage of multi-core processors.



Homework: Implementing search
Mon Jun 18 00:00:00 UTC 2018

This blog has a search icon on the upper right but it is not functional. We will implement this in a series of steps assigned as homework. The contents of the blog are stored in EDN text files in public/blog-entries directory. EDN is Clojure's form of S-expressions . HTML/XML/JSON are all forms of s-expressions. When you write Clojure code, you are writing EDN s-expressions.

We can implement this search feature without dependency on the web browser. We will implement this feature at the Clojure REPL then integrated with the web interface once the feature is implemented and bug free. This is called REPL driven development

Start emacs
% cd lambdakids
 % emacs

Once emacs is started, start the repl with

M-x cider-jack-in

You will see something like this


;; Connected to nREPL server - nrepl://localhost:35562
;; CIDER 0.17.0 (AndalucĂ­a), nREPL 0.2.13
;; Clojure 1.9.0, Java 1.8.0_151
;;     Docs: (doc function-name)
;;           (find-doc part-of-name)
;;   Source: (source function-name)
;;  Javadoc: (javadoc java-object-or-class)
;;     Exit: 
;;  Results: Stored in vars *1, *2, *3, an exception in *e;
;; ======================================================================
;; If you're new to CIDER it is highly recommended to go through its
;; manual first. Type  to view it.
;; In case you're seeing any warnings you should consult the manual's
;; "Troubleshooting" section.
;;
;; Here are few tips to get you started:
;;
;; * Press  to see a list of the keybindings available (this
;;   will work in every Emacs buffer)
;; * Press <,> to quickly invoke some REPL command
;; * Press  to switch between the REPL and a Clojure file
;; * Press  to jump to the source of something (e.g. a var, a
;;   Java method)
;; * Press  to view the documentation for something (e.g.
;;   a var, a Java method)
;; * Enable `eldoc-mode' to display function & method signatures in the minibuffer.
;; * Print CIDER's refcard and keep it close to your keyboard.
;;
;; CIDER is super customizable - try  to
;; get a feel for this. If you're thirsty for knowledge you should try
;; .
;;
;; If you think you've encountered a bug (or have some suggestions for
;; improvements) use  to report it.
;;
;; Above all else - don't panic! In case of an emergency - procure
;; some (hard) cider and enjoy it responsibly!
;;
;; You can remove this message with the  command.
;; You can disable it from appearing on start by setting
;; `cider-repl-display-help-banner' to nil.
;; ======================================================================
WARNING: clj-refactor and refactor-nrepl are out of sync.
Their versions are 2.3.1 and n/a, respectively.
You can mute this warning by changing cljr-suppress-middleware-warnings.
user> 

Reading a file

Let's read the content of the file public/blog-entries/search.blog using slurp


user>  (def blog (slurp "public/blog-entries/search.blog"))
#'user/blog
user>
user> (prn blog)
"{:blog/title \"Homework: Implementing search\"\n :blog/date #inst \"2018-06-18\"\n :blog/content [:div\n                [:p \"This blog has a search icon on the upper right \
but it is not functional. As a homework assignment, implement the search feature.\"\n                 \"The contents of the blog are stored in \" [:a {:href \"https://github.\
com/edn-format/edn\"} \"EDN\"] \"text files in public/blog-entries directory.\"\n                 \"EDN is Clojure's form of \" [:a {:href \"https://en.wikipedia.org/wiki/S-e\
xpression\"} \"S-expressions\"] \". HTML/XML/JSON are all forms of\"\n                 \"s-expressions. When you write Clojure code, you are writing EDN s-expressions.\"]\n\n\
                [:p \"Before we continue, there's an incompatability withthe latest cider-0.17 and clj-refactor packages. To fix this,  update your emacs init.el as \"]\n    \
                            \n                [:pre\n                 [:code.bash.hljs\n                  \"% wget -c https://bit.ly/2LZKmkx -O ~/.emacs.d/init.el\"]]\n\n                [:p \
                            \"We can implement this search feature without anything replated to the web.  We will implement this feature at the Clojure REPL then integrated\"\n                 \"with th\
                            e web interface once the feature is implemented and bug free.\"]\n\n                [:pre\n                 [:code.bash.hljs\n                  \"% cd lambdakids\\n\"\n      \
                                        \"% emacs\"]]\n\n                [:p \"Once emacs is started, start the repl with\"]\n                [:pre\n                 [:code.bash.hljs\n                  \
                                        \"M-x cider-jack-in\"]]\n\n                [:p \"You will see something like this\"]\n                [:pre\n                 [:code.bash.hljs\n                  (slurp \"pub\
                                        lic/cider-console.txt\")\n\n                  ]]\n\n                [:p \"Perform some experiment at the REPL\"\n                 ]\n\n                [:pre\n                \
                                         [:code.clojure.hljs\n                  \"(def blog (slurp \\\"public/blog-entries/search.blog\\\"))\"]]\n                \n                ]\n \n \n\n }\n"
                                         nil

user> (type blog)
java.lang.String

Notice that the type of blog is a String

Convert a String into EDN data structures

We have the blog in the form of a Clojure EDN string but to leverage the processing power of Clojure, we will convert it into EDN data structures using read-string

user> (def search-blog (read-string blog))
 user> (type search-blog)
 clojure.lang.PersistentArrayMap

search-blog is now an EDN data structure whereas blog is an EDN string. In particular search-blog is a Clojure Map. We can now use functions that operate on Maps for example keys

user> (keys search-blog)
 (:blog/title :blog/date :blog/content)

We can access the blog title

user> (:blog/title search-blog)
 "Homework: Implementing search"
Your assignment

You need to do some research on your own to complete this assignment. The key to this is asking the right questions so that you can google it. Feel free to reach out to me to ask questions or if anything is confusing or not clear.

  • Write a predicate function
    (defn txt-exits-in? [file-name txt])
    txt-exists-in? returns true if txt is a substring in the file given by file-name and false otherwise. It should be case insensitive. For example,
    (txt-exists-in? "public/blog-entries/search.blog" "EDN") => true
     (txt-exists-in? "public/blog-entries/search.blog" "edn") => true
     (txt-exists-in? "public/blog-entries/functions.blog" "haha") => false
    
  • Write a function
    (defn search [txt])
    search should return a vector of file-names of all files in public/blog-entries where txt is a substring in that file
    (search "EDN") => ["search.blog", "chp.blog"]
    

Once you've implemented these two functions. We can connect it to the web to implement search of the blog