Advanced Methods in Programming Languages (201-2-4411-01)

## Syntactic Abstraction - Lambda Calculus

### Problems for 24 Mar 2000

In question 1, make sure you derive your function from the BNF specification of lists, s-lists and binary-search-trees. That is, use the BNF as a definition of the set of legal expressions that are lists, s-lists and binary-search-trees respectively. Your code must be "syntax-directed" in the sense that it can "walk down" any legal-expression of these types, as defined by the following BNFs:
```<list>   ::= ({<expr>}*)

<s-list> ::= ({<s-expr>}*)
<s-expr> ::= <symbol> | <s-list>

<bin-search-tree> ::= () | (<number> <bin-search-tree> <bin-search-tree>)
```

### Syntax of the Lambda Calculus

```<exp>   ::= <varref>                        varref (var)
|  <number>                  lit (datum)
|  (if <exp> <exp> <exp>)    if (test-exp then-exp else-exp)
|  (lambda ({<var>}*) <exp>) lambda (formals body)
|  (<exp> {<exp>}*)          app (rator rands)
```
From the textbook:
1. ```path, car-cdr, car-cdr2 ``` (Exercise 2.2.9 - 1, 2, 3 p.55)
2. `lexical-address` (Exercise 2.3.10 p.62 and 3.4.6 p.87)
3. `un-lexical-address` (Exercise 2.3.13 p.63)
4. `rename` (Exercise 2.3.14 p.65)

#### path

`(path n bin-search-tree)` where `n` is a number and `bin-search-tree` is a binary search tree that contains the number `n`, returns a list of Ls and Rs showing how to find the node containing `n`. If `n` is found at the root, it returns the empty list.
```> (path 17 '(14 (7 () (12 () ()))
(26 (20 (17 () ())
())
(31 () ()))))
(R L L)
```

#### car-cdr

`(car-cdr s slst errvalue)` returns an expression that, when evaluated, produces the code for a procedure that takes a list with the same structure a `slst` and returns the value in the same position as the leftmost occurrence of `s` in `slst`. If `s` does not occur in `slst`, then `errvalue` is returned.
```> (car-cdr 'a '(a b c) 'fail)
(lambda (slst) (car slst))
> (car-cdr 'c '(a b c) 'fail)
(lambda (slst) (car (cdr (cdr slst))))
> (car-cdr 'dog '(cat lion (fish dog) pig) 'fail)
(lambda (slst) (car (cdr (car (cdr (cdr slst))))))
> (car-cdr 'a '(b c) 'fail)
fail
```

#### car-cdr2

`(car-cdr2 s slst errvalue)` like `car-cdr` but generates procedure compositions.
```> (car-cdr 'a '(a b c) 'fail)
car
> (car-cdr 'c '(a b c) 'fail)
(compose car (compose cdr cdr))
> (car-cdr 'dog '(cat lion (fish dog) pig) 'fail)
(compose car (compose cdr (compose car (compose cdr cdr))))
> (car-cdr 'a '(b c) 'fail)
fail
```

`(lexical-address exp)` where `pred` is a lambda-calculus expression, returns the lambda expression with all varrefs replaced by their lexical address.

`(un-lexical-address exp)` does the inverse of the previous exercise. You must generate new names for the variables using a form of "gensym" function.

#### rename

`(rename exp var1 var2)` returns exp[var1/var2] if var1 does not occur free in exp and #f otherwise.

### XML Abstract Data Structure Code Generator

XML is a formalism used to encode data structures according to a grammatical specification very similar to a BNF. The specification is encoded in files called DTD (Document Type Definition). Background on XML is available
here.

In this question, you have to write a scheme code generator that takes as input a DTD (in a string) and produces a set of Scheme functions to manipulate instances of the abstract data type (ADT) defined by the DTD.

For reference, a similar solution for Java is available under the name of DXML (check in particular the following white paper).

#### Example scenario:

Following an example provided with DXML, your code should work as follows:
```(define dtd1
"<!ELEMENT Dir (Name,File*)>
<!ELEMENT File (Name, Date, Size)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Date (#PCDATA)>
<!ELEMENT Size (#PCDATA)>")

(define xml1
"<?xml?>
<!DOCTYPE Dir>
<Dir>
<Name>/usr/local/src</Name>
<File>
<Name>Autoexec.bat</Name>
<Date>Friday, October 02, 1998 7:16:48 PM</Date>
<Size>1024</Size>
</File>
<File>
<Name>Config.sys</Name>
<Date>Friday, October 02, 1998 7:07:29 PM</Date>
<Size>108,421,120</Size>
</File>
</Dir>")

% (gen-xml dtd1)

;; Will generate code equivalent to the following:
;; THIS IS NOT FULLY CORRECT CODE -- IT JUST GIVES AN IDEA OF WHAT TO DO
(let ((dtd `((dir (name (file star)))
(file (name date size)))))
(define (parse-dir str)
(make-dir-object dtd (parse-xml str dtd)))
(define (parse-file str)
(make-file-object dtd (parse-xml str dtd)))
(define (make-dir-struct name files)
(vector name files))
(define (make-file-struct name date size)
(vector name date size))
(define (make-dir name files)
(make-dir-object dtd (make-dir-struct name files)))
(define (make-file name date size)
(make-file-object dtd (make-file-struct name date size)))

(define (make-dir-object dtd data)
(lambda (msg . args)
(cond ((eq? msg 'get-name)  (vector-ref data 0))
((eq? msg 'set-name!) (vector-set! data 0 (car args)))
((eq? msg 'get-file)  (vector-ref (vector-ref data 1) (car args)))
((eq? msg 'set-file!) (vector-set! (vector-ref data 1)
(car args)
((eq? msg 'display)   (print-xml data dtd)))))

(define (make-file-object dtd data)
(lambda (msg . args)
(cond ((eq? msg 'get-name)  (vector-ref data 0))
((eq? msg 'set-name!) (vector-set! data 0 (car args)))
((eq? msg 'get-date)  (vector-ref data 1))
((eq? msg 'set-date!) (vector-set! data 1 (car args)))
((eq? msg 'get-size)  (vector-ref data 2))
((eq? msg 'set-size!) (vector-set! data 2 (car args)))
((eq? msg 'display)   (print-xml data dtd)))))

;; The return value of gen-xml is an object that returns the
;; public objects: parse-dir and parse-file
(lambda (msg)
(cond ((eq? msg 'dir) parse-dir)
((eq? msg 'file) parse-file)
(else (error "Unknown message")))))

```
You must write the functions `parse-xml` and `display-xml` and code that can parse DTDs as well. Note that the functions in `parse-xml` can actually invoke functions of the type `parse-file`.

The system can be used as follows:

```% (define parser-dtd1 (gen-xml dtd1))
...
% (define d1 ((parser-dtd1 'dir) xml1))
...

% (d1 'display)
<?xml?>
<!DOCTYPE Dir>
<Dir>
<Name>/usr/local/src</Name>
<File>
<Name>Autoexec.bat</Name>
<Date>Friday, October 02, 1998 7:16:48 PM</Date>
<Size>1024</Size>
</File>
<File>
<Name>Config.sys</Name>
<Date>Friday, October 02, 1998 7:07:29 PM</Date>
<Size>108,421,120</Size>
</File>
</Dir>

% ((d1 'get-file 0) 'display)
<File>
<Name>Autoexec.bat</Name>
<Date>Friday, October 02, 1998 7:16:48 PM</Date>
<Size>1024</Size>
</File>

% ((d1 'get-file 1) 'set-size! "110,000")
ok

% (d1 'set-file! 0 (make-file ".emacs" "12 Feb 2000" "10,234"))
ok

```

#### Formats

The simplified formats you have to process are specified by the following BNF:
```dtd     ::= element*
element ::= <!ELEMENT symbol (eltref {,eltref}*)>
| <!ELEMENT symbol (#PCDATA)>
eltref  ::= symbol
| symbol"*"
| symbol"+"

```
Data of type #PCDATA is understood as a string in the generated Scheme code.

#### How to write a tokenizer in Scheme?

The full answer to the question requires you to implement a scanner in Scheme for the XML concrete syntax.

In order to do that (if you haven't done it in compilation already), first, look at Mayer Goldberg's input stream package. An input stream is defined as an object that knows how to work with two basic functions: one that gets the next character (get), and one that returns the current character (unget).

Next, look at this example: toy scanner for numbers, written using the input stream package. The idea is that the scanner implements an automaton where each state is implemented as a local function within the letrec. The scanner transforms a character string (an input stream) into a token stream. In our case, you would need a tokenizer for DTDs and XML. You need to define the automaton corresponding to the lexical structure of the XML subset defined below (special characters are <, >, *, + and #, parentheses, comma, white space and slash). However, you can submit a version of your solution that does not work directly with XML concrete syntax, but instead with a Lisp-like concrete syntax that is easier to parse in Scheme. The following shows an example:

```    <File>
<Name>Autoexec.bat</Name>
<Date>Friday, October 02, 1998 7:16:48 PM</Date>
<Size>1024</Size>
</File>

would be encoded as:

((file ((name "autoexec.bat")
(date "Friday, October 02, 1998 7:16:48 PM")
(size 1024))))
```
Start by doing the solution assuming the DTD and the XML are provided in a Scheme syntax (as lists and symbols as shown in the code example below). This way you do not need to invest too much time in tokenizing and parsing. If you want to feel good about dealing with XML, go ahead and do the parsing...