1|
2| 2.2 HIERARCHICAL DATA
3| =====================
4|
5| Pairs can be the building block for constructing hierarchies of data,
6| such as pairs of pairs, etc.
7| A pair can be visualized as a double box:
8|
9| +---+---+ +---+
10| (cons 1 2): --->| | | --|-->| 2 |
11| +---+---+ +---+
12| |
13| +---+
14| | 1 |
15| +---+
16|
17| Complex pairs are formed by using pairs for the car and cdr of pairs,
18| as in: (cons (cons 1 2) (cons 3 (cons 4 5)) )
19|
20| NOTE: The direction of the arrows in the "box-and-pointer" diagrams is
21| irrelevant.
22| The arrow pointing to the overall diagram is ESSENTIAL--it stands
23| for the hierarchical data object as a whole.
24|
25|
26| 2.2.1 REPRESENTING SEQUENCES--USING PAIRS:
27| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
28| REPRESENTATION:
29| ~~~~~~~~~~~~~~~
30| Sequences (lists) are ordered collections of data objects (compound or
31| not). They can be REPRESENTED USING PAIRS by representing the last element
32| of the sequence as the pair: (cons nil), where nil is a special LISP
33| symbol, denoting the End-of-list element, that is, visually, denoted by
34| this box: +--+
35| | /|
36| |/ |
37| +--+
38| Each element of the sequence, apart from the last, is represented by a
39| pair whose car is that element, and its cdr is the rest of the sequence.
40|
41| The list ( ... ) is represented by:
42| (cons (cons (cons ... (cons nil ) ...)))
43| The printing of sequences is using parentheses as above.
44|
45| Using the pair representation we can directly IMPLEMENT lists in Scheme.
46|
47| LISTS OPERATIONS:
48| ~~~~~~~~~~~~~~~~~
49| The main lists constructor is "list" -- it "makes" new lists from
50| arbitrary number of elements:
51|
52| > (list 3 4 5 7 8)
53| (3 4 5 7 8)
54| > (define x (list 5 6 8 2))
55| #
56| > x
57| (5 6 8 2)
58|
59| The 'list' constructor is already implemented in Scheme/LISP as a
60| primitive procedure, using the pairs representation as above.
61| That is:
62| (list ... )
63| is defined as
64| (cons (cons (cons ... (cons nil ) ...)))
65|
66| The 'car' and 'cdr' selectors of pairs can be used directly on lists:
67|
68| > (car x)
69| 5
70| > (cdr x)
71| (6 8 2)
72| > (cadr x)
73| 6
74| > (cddr x)
75| (8 2)
76| > (caddr x)
77| 8
78| > (cdddr x)
79| (2)
80| > (cadddr x)
81| 2
82| > (cddddr x)
83| ()
84| > (cons 2 x)
85| (2 5 6 8 2)
86| > (cons x x)
87| ((5 6 8 2) 5 6 8 2)
88| >
89|
90| List selectors:
91| ~~~~~~~~~~~~~~~~
92| > (define (nth n x)
93| (if (= n 0)
94| (car x)
95| (nth (- n 1) (cdr x))))
96| #
97| > (define squares (list 1 4 9 16 25 36))
98| #
99| > (nth 4 squares)
100| 25
101| > (define (length x)
102| (if (null? x)
103| 0
104| (+ 1 (length (cdr x)))))
105| WARNING: redefining built-in length
106| #
107| > (length squares)
108| 6
109|
110| Iterative length:
111|
112| (define (length x)
113| (define (length-iter a count)
114| (if (null? a)
115| count
116| (length-iter (cdr a) (+ 1 count))))
117| (length-iter x 0))
118|
119| Another constructor--append:
120|
121| > (define (append x y)
122| (if (null? x)
123| y
124| (cons (car x) (append (cdr x) y))))
125| WARNING: redefining built-in length
126| #
127| > (append squares (list squares squares))
128| (1 4 9 16 25 36 (1 4 9 16 25 36) (1 4 9 16 25 36))
129| > (append (list squares squares) squares)
130| ((1 4 9 16 25 36) (1 4 9 16 25 36) 1 4 9 16 25 36)
131| >
132|
133| 2.2.2 REPRESENTING TREES -- USING SEQUENCES:
134| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
135|
136| Trees can be represented by sequences whose elements are sequences.
137| The elements of a sequence are the tree branches, and elements that are
138| sequences represent subtrees.
139|
140| Example of a tree operation -- selector:
141|
142| > (define (countatoms x)
143| (cond ((null? x) 0)
144| ((atom? x) 1)
145| (else (+ (countatoms (car x))
146| (countatoms (cdr x))))))
147| #
148| > (define x (cons (list 1 2) (cons 3 (list 4 5)) ))
149| #
150| > (length x)
151| 4
152| > (countatoms x)
153| 5
154| > (countatoms (list x x))
155| 10
156| > (length (list x x))
157| 2
158| > (list x x)
159| (((1 2) 3 4 5) ((1 2) 3 4 5))
160| >
161|
162| 2.2.3 SYMBOLS AND THE NEED FOR QUOTE
163| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
164|
165| In order to extend our data processing capabilities we need the
166| ability to generate compound data from symbols, not just numbers. We
167| would like to manipulate--WITHOUT EVALUATION-- lists like:
168| (a b c)
169| ((Rina 2) (Moshe 3) (Ran 0))
170| (+ (* x 3) (* x y) )
171| (lambda (x y) (* (square x) y) )
172| (define (* (square x) y) )
173| For that purpose we need a means of QUOTING, since we wish to
174| manipulate the symbols themselves, not their values.
175| In LISP (Scheme) there is a quote special form, that prevents evaluation.
176|
177| > (quote a)
178| a
179| > (list 1 (quote b))
180| (1 b)
181| NOTE: Numbers have pre-defined values; they do not have to, but can be
182| quoted.
183| > (list (quote 1) (quote b))
184| (1 b)
185|
186| The 'quote' special form can be **abbreviated** by a *quoting character*.
187| The quoting character is "'". That is: 'a evaluates to a.
188|
189| > 'a
190| a
191| > (define a 'b)
192| #
193| > a
194| b
195| > (list a 'b)
196| (b b)
197| > (car '(list a 'b))
198| list
199| > (cdr '(list a 'b))
200| (a (quote b))
201| > (cddr '(list a 'b))
202| ((quote b))
203| >
204|
205| NOTE: Do not confuse the quotation character with the string-constant
206| delimiter ". The string constant "(+ a b)" IS NOT a list, and the
207| value of "a" is NOT a.
208|
209| "eq?" is a primitive for equality test for symbols.
210|
211| Membership test for a SYMBOL in a list:
212|
213| > (define (memq item x)
214| (cond ((null? x) nil)
215| ((eq? item (car x)) x)
216| (else (memq item (cdr x)))))
217| WARNING: redefining built-in memq
218| #
219| > (memq 'a '( (1 2) a b))
220| (a b)
221| > (memq 'a '(1 2 (a b) ))
222| #f
223| > (memq '(a b) '( (a b) ))
224| #f
225| >
226|
227| 2.2.4 EXAMPLE: SYMBOLIC DIFFERENTIATION
228| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
229|
230| This problem was, historically, one of the motivating problems for the
231| development of computer languages for symbolic manipulation.
232| PROBLEM: Write a procedure for differentiation of expressions, built with
233| additions and multiplications, with at most two arguments.
234| For example:
235| the derivation of ax**2 + bx + c, by x, where a, b, c are numbers
236| should give: 2ax + b.
237| Note that the expressions are handled here as DATA, not as PROCEDURES.
238| The result of the derivation procedure cannot be APPLIED, unless it is
239| turned first into a procedure (using lambda abstraction).
240|
241| The derivation rules are:
242| dc/dx = 0, if c is a constant or a variable different from x.
243| dx/dx = 1.
244| d(u + v)/dx = du/dx + dv/dx.
245| d(u*v)/dx = u( dv/dx ) + v( du/dx )
246|
247| The derivation procedure itself is on the CONCEPTUAL LEVEL. It will be
248| defined in terms of the objects that we think are most elementary for its
249| construction. These objects will serve as the DATA OBJECTS from which the
250| CONCEPTUAL LEVEL is built. These objects are called ABSTRACT DATA TYPES:
251| They are given via there CONSTRUCTORS, SELECTORS and PREDICATES, followed
252| by rules of correctness.
253|
254| CONCEPTUAL LEVEL: The derivation procedure.
255| Data objects level: ADTs.
256|
257| Looking into the derivation task we see that we need the TYPES:
258|
259| **** variable, constant, sum, product ****
260|
261| These types will have, each, constructors and selectors. For example, the
262| SUM type needs a constructor for "making" a sum, and selecting its
263| elements.
264| The ADTs that we will use, their constructors, selectors, and predicates
265| are as follows:
266|
267| CONSTANT: (make-constant )
268| (constant? )
269| VARIABLE: (variable? )
270| (same-variable? )
271| SUM: (make-sum ) sum's constructor
272| (sum? )
273| (addend ) selects the addend of a sum
274| (augend ) selects the augend of a sum
275| PRODUCT: (make-product ) product's constructor
276| (product? )
277| (multiplier ) selects multiplier of prod. .
278| (multiplicand ) selects multiplicand of a prod. .
279|
280| CONCEPTUAL LEVEL procedure in terms of these ADTs:
281|
282| (define (deriv exp var)
283| (cond ((constant? exp) (make-constant 0))
284| ((variable? exp)
285| (if (same-variable? exp var) (make-constant 1)
286| (make-constant 0)))
287| ((sum? exp)
288| (make-sum (deriv (addend exp) var) (deriv (augend exp) var)))
289| ((product? exp)
290| (make-sum
291| (make-product (multiplier exp)
292| (deriv (multiplicand exp) var))
293| (make-product (deriv (multiplier exp) var)
294| (multiplicand exp))))))
295|
296|
297| Representing and Implementing the ADTs of algebraic expressions:
298| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
299| We choose the implement algebraic expressions in LISP as lists,
300| written in prefix notation. For example, ax+ b would be represented as the
301| list: (+ (* a x) b). The implementation of the ADTs is straightforward:
302|
303| CONSTANT:
304| (define (make-constant x) x )
305| (define (constant? x) (number? x))
306|
307| VARIABLE:
308| (define (variable? x) (symbol? x))
309| (define (same-variable? v1 v2)
310| (and (variable? v1) (variable? v2) (eq? v1 v2)))
311|
312| SUM:
313| (define (make-sum a1 a2) (list '+ a1 a2))
314| (define (make-product m1 m2) (list '* m1 m2))
315| (define (sum? x)
316| (if (not (atom? x)) (eq? (car x) '+) nil))
317| (define (addend s) (cadr s))
318| (define (augend s) (caddr s))
319|
320| PRODUCT:
321| (define (product? x)
322| (if (not (atom? x)) (eq? (car x) '*) nil))
323| (define (multiplier p) (cadr p))
324| (define (multiplicand p) (caddr p))
325|
326| > (deriv '(+ x 6) 'x)
327| (+ 1 0)
328| > (deriv '(* x (+ y 5)) 'x)
329| (+ (* x (+ 0 0)) (* 1 (+ y 5)))
330| > (deriv '(* x (+ y 5)) 'y)
331| (+ (* x (+ 1 0)) (* 0 (+ y 5)))
332| >
333|
334| The answers are not reduced, and hard to verify. We will follow the
335| same approach as with rational numbers: change the implementation of the
336| ADTs. We will not change the derivation procedure. The changes are in
337| the implementation of the sum's and product's constructors. They will
338| take care to reduce the resulting expression if *0, or *1, or +0 are
339| included. This is reduction at "construction time".
340|
341| ;;; Better versions of make-sum and make-product
342|
343| (define (make-sum a1 a2)
344| (cond ((and (number? a1) (number? a2)) (+ a1 a2))
345| ((number? a1) (if (= a1 0) a2 (list '+ a1 a2)))
346| ((number? a2) (if (= a2 0) a1 (list '+ a1 a2)))
347| (else (list '+ a1 a2))))
348|
349| (define (make-product m1 m2)
350| (cond ((and (number? m1) (number? m2)) (* m1 m2))
351| ((number? m1)
352| (cond ((= m1 0) 0)
353| ((= m1 1) m2)
354| (else (list '* m1 m2))))
355| ((number? m2)
356| (cond ((= m2 0) 0)
357| ((= m2 1) m1)
358| (else (list '* m1 m2))))
359| (else (list '* m1 m2))))
360|
361| > (deriv '(+ x 6) 'x)
362| 1
363| > (deriv '(* x (+ y 5)) 'x)
364| (+ y 5)
365| > (deriv '(* x (+ y 5)) 'y)
366| x
367| > (deriv '(* (* x y) (+ y 6)) 'y)
368| (+ (* x y) (* x (+ y 6)))
369| >
370|
371|
372| 2.2.5 EXAMPLE: REPRESENTING SETS
373| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
374|
375| Sets as ADTs:
376|
377| 1. The operators for sets are:
378| union-set( ),
379| intersection-set( ),
380| element-of-set?( ~~ ),
381| adjoin-set( ~~~~ ).
382| Special constant: empty-set
383| 2. Correctness rules:
384| (element-of-set? (adjoin ~~~~) )
385| (element-of-set? (union-set ) ) iff
386| (or (element-of-set? ) (element-of-set? ) )
387| (element-of-set? (intersection-set ) ) iff
388| (and (element-of-set? ) (element-of-set? ) )
389| (not (element-of-set? empty-set) )
390|
391| There are many ways to REPRESENT and IMPLEMENT the set ADT.
392| They, all, satisfy the correctness rules, but differ in resources used.
393|
394| Sets as unordered lists
395| ~~~~~~~~~~~~~~~~~~~~~~~
396|
397| The empty set is represented by the empty list, and a set is represented
398| by a list with no repetitions of elements.
399|
400| (define empty-set '())
401|
402| (define (element-of-set? x set)
403| (cond ((null? set) nil)
404| ((equal? x (car set)) t)
405| (else (element-of-set? x (cdr set)))))
406|
407| (define (adjoin-set x set)
408| (if (element-of-set? x set)
409| set
410| (cons x set)))
411|
412| (define (intersection-set set1 set2)
413| (cond ((or (null? set1) (null? set2)) '())
414| ((element-of-set? (car set1) set2)
415| (cons (car set1)
416| (intersection-set (cdr set1) set2)))
417| (else (intersection-set (cdr set1) set2))))
418|
419| EFFICIENCY CONSIDERATIONS: element-of-set? is most important, as all
420| other operations use it. in the worst case, element-of-set? scans all of
421| the input set, so its order of grows for time is O(n), where n is the size
422| of the set. intersection-set and union-set have time complexity O(n**2).
423|
424|
425| Sets as ordered lists
426| ~~~~~~~~~~~~~~~~~~~~~
427|
428| Representing sets as ordered lists relies on the existence of a total
429| ordering on the collection of elements from the sets are constructed.
430| For example, each element can be assigned a unique number id, and the
431| id-s can be totally ordered. The advantage of ordering is that it saves
432| search time.
433| We consider here only sets of numbers. So, ordering is available by the
434| arithmetic relations > and <.
435|
436| (define (element-of-set? x set)
437| (cond ((null? set) nil)
438| ((= x (car set)) t)
439| ((< x (car set)) nil)
440| (else (element-of-set? x (cdr set)))))
441|
442| ;;; The complexity is still O(n), but on the average it will be O(n/2).
443|
444| (define (intersection-set set1 set2)
445| (if (or (null? set1) (null? set2))
446| '()
447| (let ((x1 (car set1)) (x2 (car set2)))
448| (cond ((= x1 x2) (cons x1 (intersection-set (cdr set1) (cdr set2))))
449| ((< x1 x2) (intersection-set (cdr set1) set2))
450| ((< x2 x1) (intersection-set set1 (cdr set2)))))))
451|
452| COMPLEXITY: At each step we reduce the size of the problem, as at least
453| one argument of the recursive call shrinks. The number of recursive calls
454| is, at most, the sum of the sets' sizes. Hence, the complexity is O(n)
455| (compare with O(n**2) in the unordered lists implementation).
456|
457|
458| Sets as binary trees
459| ~~~~~~~~~~~~~~~~~~~~
460| The assumption is that the collection of elements that make the sets is
461| totally ordered. A set can be represented as a binary search tree: A
462| binary tree where each node is labeled by a set element. The set elements
463| stored in the left subtree of a node x are, all, less than the element
464| stored in x, and all set elements stored in the right subtree of x are
465| greater than the element stored in x. Sometimes, the term "entry" is used
466| for referring to the set element stored at a node: "the entry of x" is the
467| element stored at x. The major property of binary search trees is that
468| search for an element takes time order of O(logn), where n is the size of
469| the set (since at every search round the problem size is cut by half --
470| provided that the tree is balanced. Note that halving the size of the
471| problem at each step characterizes logarithmic grows, e.g., the fast
472| exponentiation procedure).
473| Representation of labeled binary trees: Three element lists, with the
474| first element (car) being the entry of the root, the second and third are
475| the representations of the left and right subtrees, respectively. The
476| empty tree is represented by the empty list. So, a tree with empty left
477| and right subtrees is represented by: (entry () ()).
478|
479| ADT operations for the labeled binary tree representation:
480|
481| (define (entry tree) (car tree))
482|
483| (define (left-branch tree) (cadr tree))
484|
485| (define (right-branch tree) (caddr tree))
486|
487| (define (make-tree entry left right)
488| (list entry left right))
489|
490| ADT operations for the sets' representation with binary search trees:
491|
492| (define (element-of-set? x set)
493| (cond ((null? set) nil)
494| ((= x (entry set)) t)
495| ((< x (entry set)) (element-of-set? x (left-branch set)))
496| ((> x (entry set)) (element-of-set? x (right-branch set)))))
497|
498| (define (adjoin-set x set)
499| (cond ((null? set) (make-tree x '() '()))
500| ((= x (entry set)) set)
501| ((< x (entry set))
502| (make-tree (entry set)
503| (adjoin-set x (left-branch set))
504| (right-branch set)))
505| ((> x (entry set))
506| (make-tree (entry set)
507| (left-branch set)
508| (adjoin-set x (right-branch set))))))
509|
510|
511| COMPLEXITY: element-of-set? and adjoin-set work at O(logn) time, for sets
512| of size n, since they perform search, and provided that the tree is,
513| indeed, balanced.
514| intersection-set and union-set can operate at O(nlogn), since they can
515| can take every element in and search it, using element-of-set?
516| in . This would amount to n times an O(logn) operation.
517| The requirement for a more or less balanced tree is essential as in the
518| extreme case of a binary tree with all left (right) subtrees being empty,
519| the tree is just an ordered list!
520|
521|
522| Sets and information retrieval
523| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
524|
525| Given a database of records, each with a unique identifying key.
526| If the database is implemented as unordered list, the search for a record
527| can be performed by:
528|
529| (define (lookup given-key set-of-records)
530| (cond ((null? set-of-records) nil)
531| ((equal? given-key (key (car set-of-records)))
532| (car set-of-records))
533| (else (lookup given-key (cdr set-of-records)))))
534|
535|
536| 2.2.6 EXAMPLE: Huffman Encoding Trees
537| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
538|
539| Huffman encoding trees are trees used for encoding and decoding of
540| messages, where the codes of symbols that make the messages are
541| constructed by following the trees' branches. The symbols are given
542| together with their frequencies in the messages.
543| For example:
544| { (A 8) (B 3) (C 1) (D 1) (E 1) (F 1) (G 1) (H 1) }
545|
546| A Huffman tree for these symbols and frequencies is:
547|
548| --------------{A B C D E F G H} 17-----------
549| | |
550| | |
551| A 8 ----------------{B C D E F G H} 9----
552| | |
553| | |
554| -----{B C D} 5----- ------{E F G H} 4---
555| | | | |
556| | | | |
557| B 3 --{C D} 2-- --{E F} 2-- --{G H} 2--
558| | | | | | |
559| | | | | | |
560| C 1 D 1 E 1 F 1 G 1 H 1
561|
562| Encoding of symbols: Given a tree, its encoding is given by the path to
563| the unique leaf labelld with that symbol, where every right branch is a
564| "1" and every keft branch is a "0". For example: A's encoding is 0, and
565| D's encoding is 1011.
566| Decoding of messages: Given a message such as 10001010, starting from
567| the root, follow left and right branching corresponding to 0 or 1 in the
568| message. Every leaf that is reached yields a symbol in the message, and
569| the process starts again with the rest of the message.
570| For example, the above message decodes to ABC, with the above tree.
571| Encoding: Given a system of symbols associated with their frequencies,
572| an Huffman tree is constructed, following the given below algorithm. The
573| codes of the symbols are obtained from the tree.
574| Algorithm for Huffman's trees:
575| Input: A set of pairs of symbols associated with frequencies.
576| Method:
577| 1. Each pair of a symbol and frequency is a leaf.
578| 2. Repeat:
579| a. Find two pairs in the set, with lowest frequency.
580| b. Merge the pairs, to produce a pair labelld with the union of the
581| symbols and has the sum of the frequencies as its frequency.
582| For example: (E 1) and (F 1) yield after the merge: ( {E F} 2 ).
583| c. Construct a node in the tree that has the two merged pairs as its
584| left and right branches, and the merged pair as its label.
585| d. In the set of pairs: Replace the two pairs that were merged by
586| the merged pair.
587| Until a singlton set of pairs is left.
588| 3. The left pair is the root of the tree.
589|
590| For example, the above Huffman tree is obtained from the algorithm for
591| construction of Huffman's trees, given the above list of symbol-frequency
592| pairs as input.
593|
594| Following is a system for encoding and decoding using Huffman's trees.
595| First, Huffman's trees are abstracted as an ADT, based on two ADTs for a
596| leaf and a code-tree:
597|
598|
599| ;;; representation for tree leaves
600|
601| (define (make-leaf symbol weight)
602| (list 'leaf symbol weight))
603|
604| (define (leaf? object)
605| (eq? (car object) 'leaf))
606|
607| (define (symbol-leaf x) (cadr x))
608|
609| (define (weight-leaf x) (caddr x))
610|
611| ;;; representation for trees
612|
613| (define (make-code-tree left right)
614| (list left
615| right
616| (append (symbols left) (symbols right))
617| (+ (weight left) (weight right))))
618|
619| (define (left-branch tree) (car tree))
620|
621| (define (right-branch tree) (cadr tree))
622|
623| (define (symbols tree)
624| (if (leaf? tree)
625| (list (symbol-leaf tree))
626| (caddr tree)))
627|
628| (define (weight tree)
629| (if (leaf? tree)
630| (weight-leaf tree)
631| (cadddr tree)))
632|
633| ;;; decoding messages
634|
635| (define (decode bits tree)
636| (decode-1 bits tree tree))
637|
638| (define (decode-1 bits tree current-branch)
639| (if (null? bits)
640| '()
641| (let ((next-branch
642| (choose-branch (car bits) current-branch)))
643| (if (leaf? next-branch)
644| (cons (symbol-leaf next-branch)
645| (decode-1 (cdr bits) tree tree))
646| (decode-1 (cdr bits) tree next-branch)))))
647|
648| (define (choose-branch bit branch)
649| (cond ((= bit 0) (left-branch branch))
650| ((= bit 1) (right-branch branch))
651| (else (error "bad bit -- CHOOSE-BRANCH" bit))))
652|
653| ;;; procedure for handling ordered sets
654|
655| (define (adjoin-set x set)
656| (cond ((null? set) (list x))
657| ((< (weight x) (weight (car set))) (cons x set))
658| (else (cons (car set)
659| (adjoin-set x (cdr set))))))
660|
661| ;;; transform a list of symbol-frequency pairs into an ordered set of
662| ;;; leaves
663|
664| (define (make-leaf-set pairs)
665| (if (null? pairs)
666| '()
667| (let ((pair (car pairs)))
668| (adjoin-set (make-leaf (car pair) ;symbol
669| (cadr pair)) ;frequency
670| (make-leaf-set (cdr pairs))))))
~~