search


Transformation of Data
Fri Sep 14 00:00:00 UTC 2018

What is functional programming? The literature espouse first class function values, mathematical purity of functions, closures, immutability, monads, functors or combinators. However, these are just secondary supporting tools/ideas. The essence of functional programming ironically is not about functions but about data and the use of functions to transform data. Every programming language has functions but not all languages are functional programming languages. What makes a language a functional programming language is that it encourages designing programs in a functional style. One of the core ideas of functional programming is the transformation of data. Contrary to the name, programming in a functional style is more about the data than the functions. Programming in Clojure, or other FP languages, is all about applying functions to the transformation of data. A program is a collection of functions that transforms data. As a dialect of LISP, Clojure takes this concept to the extreme. A program written in Clojure is itself a data structure in the form of EDN s-expressions that you manipulate and transform. This is why you need a good structural editor, like emacs with paredit, to program in Clojure

Transformation of Data Structures

When designing a program, we start with the data structures needed to model the domain problem. Next, we write functions to transform the data structures. There are two main categories of data transformation: mapping and folding

Mapping

Mapping is applying a function to every element of a collection producing a new collection. Mapping perserves the structure of the original collection. For example: (def numbers [1 2 3 4 5]) ;;map is a function that takes two arguments: first argument is a function, the second argument is a collection like a list, vector, set or map (def incremented-numbers (map inc numbers)) (def doubled-numbers (map (fn [n] (* 2 n)) numbers)) (println "numbers -> incremented-numbers=" numbers "->" incremented-numbers) (println "numbers -> doubled-numbers=" numbers "->" doubled-numbers) (println "type of numbers=" (type numbers)) (println "type of doubled-numbers=" (type doubled-numbers)) ;;mapping preserves structure of the original data ;;notice incremented-numbers and doubled-numbers contains the same number of elements as numbers ;;numbers is a vector but map returns a lazy sequence. Use mapv if you want the result to be a vector ;;instead of a lazy sequence (def doubled-numbers-as-vector (mapv (fn [n] (* 2 n)) numbers)) (println "doubled-numbers-as-vector=" doubled-numbers-as-vector) (println "type of doubled-numbers-as-vector=" (type doubled-numbers-as-vector))

List Comprehension

In non-functional languages, the key word 'for' is used as a loop, but in Clojure, 'for' is a List Comprehension . It is based on mathematical way of defining sets by defining its properties instead of enumerating the set, which can be infinite, directly. This idea is called Set Comprehension The 'for' list comprehension can be used to implement map. We've seen this used in our pair programming sessions. ;;for can be used to implement map (defn my-map [f a-collection] (for [element a-collection] (f element))) (def numbers [1 2 3 4 5]) (def incremented-numbers (my-map inc numbers)) (def doubled-numbers (my-map (fn [n] (* 2 n)) numbers)) (println "numbers=" numbers) (println "incremented-numbers=" incremented-numbers) (println "doubled-numbers=" doubled-numbers) (println "type of numbers=" (type numbers)) (println "type of doubled-numbers=" (type doubled-numbers))

Folding

Mapping preserves the structure of the original data, but folding can change the structure of the original data. Folding applies a function that combines data in the collection resulting in a single value. To understand what fold is and why it is called fold, the following diagrams will help.

The list (1 2 3 4 5) is shown as a tree and not as a flat linear structure because lists are constructed using a function called cons . In most LISP implementations, a list is a chain of cons cells with the last element ending with a nil value. Clojure departs from this tradition but let's pretend this is not the case because fold is defined using lists constructed with cons cells. The diagram above is stolen from haskell documentation and the haskell function for cons is represented by colon ':'. Folding is simply replacing the cons function with an arbitary combining function f and replacing the nil value with an arbitary value z. The function f must take two parameters. There are two ways of evaluating the combinding function f. One way is with the values of the list on the left side of the function f (right fold). The other way is with the values of the list on the right side of the function f (left fold). Don't confuse the direction of the list and the fold. The direction of the fold refers to the order of expression evaluation and not the direction of the list. Let's convert the diagram above into Clojure code to get a better understanding of right and left fold.

(def a '( 1 2 3 4 5)) ;;this is a list (def b (cons 1 (cons 2 (cons 3 (cons 4 (cons 5 nil)))))) ;;this is also a list in traditional LISP ;;In traditional LISP, a list is chain of cons cells with the last pair in the chain ending with nil ;;This is not true in Clojure but let's pretend it is because it makes it easier to explain fold. ;;folding is simply replacing the cons function with a combinding function f and the nil with an arbitary value z. ;;let's replace cons with + and the nil with z (def z 0) ;; let's give z a value of 0. ;;notice the right most expression is evaluated first ;;and folds/collapse on the right, thus the name right fold (def right-fold-b (+ 1 (+ 2 (+ 3 (+ 4 (+ 5 z)))))) ;;notice the left most expression is evaluated first ;;and folds/collapse on the left, thus the name left fold (def left-fold-b (+ (+ (+ (+ (+ z 1) 2) 3) 4) 5)) ;;folding the list b with the + function sums all the numbers in b ;;since combinding function + is associative meaning x+y is the same as y+x ;;right fold and left fold produces the same result (println "right-fold-b=" right-fold-b) (println "left-fold-b=" left-fold-b)
Reducing

Left fold is also known as "reduce" in Clojure. Other languages like SmallTalk/Groovy/Ruby call left fold "inject"

(def b [1 2 3 4 5]) ;;this is a vector and not cons cells but we can still left fold (or reduce it) (reduce + b)
Map Reduce and Big Data

Google published a paper MapReduce: Simplfied Data Processing on Large Clusters that has inspired Big Data processing systems like Hadoop and Spark. It is obvious this idea is inspired directly from LISP map and reduce functions. It is worth mentioning that the combinding function f can execute in parrallel to take advantage of multi-core processors. Clojure has a parrallel version of map called pmap that runs the mapping function f on multiple threads taking advantage of multi-core processors.