Advanced Methods in Programming Languages (201-2-4411-01)
HW Assignment 1 - Spring 2000 - Michael Elhadad

Syntactic Abstraction - Lambda Calculus

Problems for 13 Mar 2000

News

6 Mar 2000: Delayed from Mar 8 to Mar 13.

6 Mar 2000: How to write a tokenizer in Scheme?

First, look at Mayer Goldberg's input stream package. An input stream is defined as an object that knows how to work with two basic functions: one that gets the next character (get), and one that returns the current character (unget).

Next, look at this example: toy scanner for numbers, written using the input stream package. The idea is that the scanner implements an automaton where each state is implemented as a local function within the letrec. The scanner transforms a character string (an input stream) into a token stream. In our case, you would need a tokenizer for DTDs and XML. You need to define the automaton corresponding to the lexical structure of the XML subset defined below (special characters are <, >, *, + and #, parentheses, comma, white space and slash).

6 Mar 2000: Start by doing the solution assuming the DTD and the XML are provided in a Scheme syntax (as lists and symbols as shown in the code example below). This way you do not need to invest too much time in tokenizing and parsing.

In question 1, make sure you derive your function from the BNF specification of lists, s-lists and binary-search-trees. That is, use the BNF as a definition of the set of legal expressions that are lists, s-lists and binary-search-trees respectively. Your code must be "syntax-directed" in the sense that it can "walk down" any legal-expression of these types, as defined by the following BNFs:

<list>   ::= ({<expr>}*)

<s-list> ::= ({<s-expr>}*)
<s-expr> ::= <symbol> | <s-list>

<bin-search-tree> ::= () | (<number> <bin-search-tree> <bin-search-tree>)

Syntax of the Lambda Calculus

<exp>   ::= <varref>                        varref (var)
               |  <number>                  lit (datum)
               |  (if <exp> <exp> <exp>)    if (test-exp then-exp else-exp)
               |  (lambda ({<var>}*) <exp>) lambda (formals body)
               |  (<exp> {<exp>}*)          app (rator rands)
From the textbook:
  1. path, car-cdr, car-cdr2 (Exercise 2.2.9 - 1, 2, 3 p.55)
  2. lexical-address (Exercise 2.3.10 p.62 and 3.4.6 p.87)
  3. un-lexical-address (Exercise 2.3.13 p.63)
  4. rename (Exercise 2.3.14 p.65)

path

(path n bin-search-tree) where n is a number and bin-search-tree is a binary search tree that contains the number n, returns a list of Ls and Rs showing how to find the node containing n. If n is found at the root, it returns the empty list.
> (path 17 '(14 (7 () (12 () ()))
	              (26 (20 (17 () ())
                          ())
                (31 () ()))))
(R L L)

car-cdr

(car-cdr s slst errvalue) returns an expression that, when evaluated, produces the code for a procedure that takes a list with the same structure a slst and returns the value in the same position as the leftmost occurrence of s in slst. If s does not occur in slst, then errvalue is returned.
> (car-cdr 'a '(a b c) 'fail)
(lambda (slst) (car slst))
> (car-cdr 'c '(a b c) 'fail)
(lambda (slst) (car (cdr (cdr slst))))
> (car-cdr 'dog '(cat lion (fish dog) pig) 'fail)
(lambda (slst) (car (cdr (car (cdr (cdr slst))))))
> (car-cdr 'a '(b c) 'fail)
fail

car-cdr2

(car-cdr2 s slst errvalue) like car-cdr but generates procedure compositions.
> (car-cdr 'a '(a b c) 'fail)
car
> (car-cdr 'c '(a b c) 'fail)
(compose car (compose cdr cdr))
> (car-cdr 'dog '(cat lion (fish dog) pig) 'fail)
(compose car (compose cdr (compose car (compose cdr cdr))))
> (car-cdr 'a '(b c) 'fail)
fail

lexical-address

(lexical-address exp) where pred is a lambda-calculus expression, returns the lambda expression with all varrefs replaced by their lexical address.

un-lexical-address

(un-lexical-address exp) does the inverse of the previous exercise. You must generate new names for the variables using a form of "gensym" function.

rename

(rename exp var1 var2) returns exp[var1/var2] if var1 does not occur free in exp and #f otherwise.

XML Abstract Data Structure Code Generator

XML is a formalism used to encode data structures according to a grammatical specification very similar to a BNF. The specification is encoded in files called DTD (Document Type Definition). Background on XML is available
here.

In this question, you have to write a scheme code generator that takes as input a DTD (in a string) and produces a set of Scheme functions to manipulate instances of the abstract data type (ADT) defined by the DTD.

For reference, a similar solution for Java is available under the name of DXML (check in particular the following white paper).

Example scenario:

Following an example provided with DXML, your code should work as follows:
(define dtd1 
"<!ELEMENT Dir (Name,File*)>
 <!ELEMENT File (Name, Date, Size)>
 <!ELEMENT Name (#PCDATA)>
 <!ELEMENT Date (#PCDATA)>
 <!ELEMENT Size (#PCDATA)>")

(define xml1
 "<?xml?> 
  <!DOCTYPE Dir> 
  <Dir>
    <Name>/usr/local/src</Name> 
    <File>
      <Name>Autoexec.bat</Name> 
      <Date>Friday, October 02, 1998 7:16:48 PM</Date> 
      <Size>1024</Size> 
    </File>
    <File>
      <Name>Config.sys</Name> 
      <Date>Friday, October 02, 1998 7:07:29 PM</Date> 
      <Size>108,421,120</Size> 
    </File>
  </Dir>")
  
% (gen-xml dtd1)

;; Will generate code equivalent to the following:
(let ((dtd `((dir (name (file star)))
             (file (name date size)))))
  (define (parse-dir str)
    (make-dir-object dtd (parse-xml str dtd)))
  (define (parse-file str)
    (make-file-object dtd (parse-xml str dtd)))
  (define (make-dir-struct name files)
    (vector name files))
  (define (make-file-struct name date size)
    (vector name date size))
  (define (make-dir name files)
    (make-dir-object dtd (make-dir-struct name files)))
  (define (make-file name date size)
    (make-file-object dtd (make-file-struct name date size)))
    
  (define (make-dir-object dtd data)
    (lambda (msg . args)
      (cond ((eq? msg 'get-name)  (vector-ref data 0))
            ((eq? msg 'set-name!) (vector-set! data 0 (car args)))
            ((eq? msg 'get-file)  (vector-ref (vector-ref data 1) (car args)))
            ((eq? msg 'set-file!) (vector-set! (vector-ref data 1)
                                               (car args)
                                               (cadr args)))
            ((eq? msg 'display)   (print-xml data dtd)))))

  (define (make-file-object dtd data)
    (lambda (msg . args)
      (cond ((eq? msg 'get-name)  (vector-ref data 0))
            ((eq? msg 'set-name!) (vector-set! data 0 (car args)))
            ((eq? msg 'get-date)  (vector-ref data 1))
            ((eq? msg 'set-date!) (vector-set! data 1 (car args)))
            ((eq? msg 'get-size)  (vector-ref data 2))
            ((eq? msg 'set-size!) (vector-set! data 2 (car args)))
            ((eq? msg 'display)   (print-xml data dtd)))))

  ;; The return value of gen-xml is an object that returns the
  ;; public objects: parse-dir and parse-file
  (lambda (msg)
    (cond ((eq? msg 'dir) parse-dir)
          ((eq? msg 'file) parse-file)
          (else (error "Unknown message")))))

You must write the functions parse-xml and display-xml and code that can parse DTDs as well. Note that the functions in parse-xml can actually invoke functions of the type parse-file.

The system can be used as follows:

% (define parser-dtd1 (gen-xml dtd1))
...
% (define d1 ((parser-dtd1 'dir) xml1))
...

% (d1 'display)
  <?xml?> 
  <!DOCTYPE Dir> 
  <Dir>
    <Name>/usr/local/src</Name> 
    <File>
      <Name>Autoexec.bat</Name> 
      <Date>Friday, October 02, 1998 7:16:48 PM</Date> 
      <Size>1024</Size> 
    </File>
    <File>
      <Name>Config.sys</Name> 
      <Date>Friday, October 02, 1998 7:07:29 PM</Date> 
      <Size>108,421,120</Size> 
    </File>
  </Dir>

% ((d1 'get-file 0) 'display)
    <File>
      <Name>Autoexec.bat</Name> 
      <Date>Friday, October 02, 1998 7:16:48 PM</Date> 
      <Size>1024</Size> 
    </File>

% ((d1 'get-file 1) 'set-size! "110,000")
ok

% (d1 'set-file! 0 (make-file ".emacs" "12 Feb 2000" "10,234"))
ok

Formats

The simplified formats you have to process are specified by the following BNF:
dtd     ::= element*
element ::= <!ELEMENT symbol (eltref {,eltref}*)>
          | <!ELEMENT symbol (#PCDATA)>
eltref  ::= symbol
          | symbol"*"
          | symbol"+"

Data of type #PCDATA is understood as a string in the generated Scheme code.
Last modified 6 Mar 2000 Michael Elhadad