Interpreters

Advanced Methods in Programming Languages (201-2-4411-01)
Notes Week 3 - Michael Elhadad

Building Interpreters

Definition

We start investigating the semantics of programming languages by building interpreters to simulate their execution in a well-understood framework. The definition of an interpreter includes the following stages:

Selection of the following domains: expressed values and denoted values.
Definition of the abstract and concrete syntax of the language.
Definition of the computation rule of each construct in the abstract syntax.

Expressed values are values that can be the result of the evaluation of an expression of the language defined by the syntax.

Denoted values are values that can be associated to variables in the language. The relation between a variable and its value is called denotation (the variable name denotes a value).

Before writing an interpreter, we must distinguish between meta- and object-language. The meta-language is the language in which we write the interpreter (we use Scheme). The object-language is the language we are describing and that is evaluated by the interpreter.

In the definition of the object-language, we rely on some features of the meta-language. For example, if we allow ourselves to use a meta-language with assignment and mutable data-structures, then it is easy to simulate assignment in the object language. We can, however, restrict ourselves, and force ourselves not to use the mutation facilities of the meta-language if we want to gain a better understanding of what is involved. During the course of the investigation, we will impose on ourselves various such restrictions to understand different aspects of the object-language.

A First Language: L0

Consider the Object-language with the following syntax:

<exp>   ::= <integer-literal>
         |  <varref>
         |  <operator> <operands>

<operator> ::= <varref> | (<exp>)

<operands> ::= () | (<operand> {; <operand>})

<operand> ::= <exp>

<varref> ::= <var>

The following abstract syntax is defined:

lit(datum)
app(rator rands)
varref(var)

A first interpreter for this language is defined as follows:

(define eval-exp
  (lambda (exp)
    (cases expr exp
      (lit (datum) datum)
      (varref (var) (apply-env init-env var))
      (app (rator rands) 
        (let ((proc (eval-exp rator))
              (args (eval-rands rands)))
          (apply-proc proc args)))
      (else (error "invalid abstract syntax: " exp)))))
 
(define eval-rands
  (lambda (rands)
    (map eval-exp rands)))

Dealing with Primitives

The first point to clarify is how the object-language will deal with primitives. By definition, primitives are used as black-boxes by the object-language and they must be implemented in the meta-language.

The following schema provides one solution. Other solutions are possible. (For example, one can bind the name of a procedure to a procedure object in the object language, when, as in Scheme, first class procedural objects are available. Another option is to consider primitive application as a special syntactic form -- syntactically different from a non-primitive application.)


Define procedures:
<procedure> ::= <prim-op>
<prim-op>   ::= <addition>     +
            | <substraction>   -
            | <multiplication> *
            | <division> /

Abstract syntax: prim-proc (prim-op)

(define (apply-proc proc args)
  (cases proc
    (prim-proc (prim-op) (apply-prim-op prim-op args))
    (else (error "Invalid proc: " proc))))

(define (apply-prim-op op args)
  (case prim-op
    ((+) (+ (car args) (cadr args)))
    ((-) (- (car args) (cadr args)))
    ((*) (* (car args) (cadr args)))))

Note that in this solution, the primitive operation is an expressed value (that is, the expression "+" is a valid expression in L0, whose value is the primitive procedure (prim-proc +).

Here are some semantic questions we can already ask on L0 based on the interpreter definition:

In which order are sub-expressions evaluated? Is there a way to test it?
What happens in case a primitive raises an error (division by 0)? Can it be handled by the interpreter? Suggest ways to deal with such errors (eg, introduce a special value NaN - Not a Number - in the expressed values domain).

These questions are "semantic" (and not syntactic) because they refer to the computation process - and not to syntactic properties of the program.

Conditional Evaluation: L1

We now extend the language L0 with support for conditional evaluation. The steps are:

Introduce a new syntax for conditional expressions.
Define computational rules for this new construct.

The syntax of L0 is expanded with the following rule:

<exp> ::= if <exp> then <exp> else <exp>

And the corresponding abstract syntax is:

(if-exp (test expr?)
        (then expr?)
        (else expr?))

The computation rule we add to the interpreter is:

(define eval-expr
  (lambda (e)
    (cases expr e
      (lit (datum) datum)
      (varref (var) (apply-env init-env var))
      (if-exp (test then else)
        (let ((tv (eval-exp test)))
          (if (true-val? tv)
              (eval-exp then)
              (eval-exp else))))
      (app (rator rands) 
        (let ((proc (eval-exp rator))
              (args (eval-rands rands)))
          (apply-proc proc args))))))

;; Determine when an expressed value is considered true in L1
(define true-val?
  (lambda (e) (if (eqv? e 0) #f #t)))

;; Apply-proc and eval-rands do not change from above.

Some semantic questions we can ask about L1:

Which expressed values are considered true by the if test?
Is if a special form in the sense of Scheme?

One semantic questions we cannot answer with the current implementation of the interpreter: how does IF work? This is because we rely on the IF of the meta-language to implement IF in the object language.

Dealing with Environments and Names: L2

To bind names to denoted values, we use environments. An environment is abstractly characterized as a finite functions from names to denoted values. The finite function ADT defined in a previous lesson is used for this purpose.

The environment ADT is specified by the following interface:

An environment is a finite function from symbols to denoted values.

(empty-env): create the empty environment.
(extend-env symbols denvals env):
             create a new environment nenv such that:
             when symbols = {s1, s2, ..., sn} and
                  denvals = {val1, val2, ..., valn} then
             nenv(si) = vali for any si in symbols and
             nenv(s) = env(s) for any other symbol.
(apply-env env s): returns the denoted value of env on s.

In L2, we will work with the same set of expressed values (Numbers) and determine that the set of denoted values is the same set (Numbers).

The new syntax to introduce local variables is:

<expr> ::= let { <var> = <expr> }* in <expr>

with the associated abstract syntax:

(let-expr (vars (list-of symbol?))
          (vals (list-of expr?))
          (body expr?))

With this new syntax, the following expressions belong to L2:

let x = 1
    y = 2 in
    let x = 2 in
       +(x, y)

We now define a computation rule which will insure that L2 is a lexically scoped language - that is, the let-computation rule will be such that the link between a varref and the corresponding binding position in a let can be determined through syntactic analysis in a way similar to the free/bound analysis discussed in the lambda calculus in a previous lesson.

To this end, we must modify the signature of the interpreter. Up to L1, we used an interpreter eval-exp with the following signature:

eval-expr: Expr --> Number

We now modify it to be:

eval-expr: Expr x Env --> Number

That is, the interpreter now evaluates an expression within a given environment.

The new computation rule is shown in the following interpreter:

(define eval-expr
  (lambda (e env)
    (cases expr e
      (lit (datum) datum)
      (varref (var) (apply-env env var))
      (if-exp (test then else)
        (let ((tv (eval-exp test env)))
          (if (true-val? tv)
              (eval-exp then env)
              (eval-exp else env))))
      (app (rator rands) 
        (let ((proc (eval-exp rator env))
              (args (eval-rands rands env)))
          (apply-proc proc args)))
 
      (let-exp (vars vals body)
        (let ((expvals (eval-rands vals env)))
          (eval-exp body
                    (extend-env vars expvals env)))))))


(define (eval-rands rands env)
  (map (lambda (rand) (eval-exp rand env)) rands))

(define (apply-proc proc expvals)
  (cases proc
    (prim-proc (prim-op) (apply-prim-op prim-op expvals))
    (else (error "Invalid proc: " proc))))

(define (apply-prim-op op args)
  (case prim-op
    ((+) (+ (car args) (cadr args)))
    ((-) (- (car args) (cadr args)))
    ((*) (* (car args) (cadr args)))
    ((/) (/ (car args) (cadr args)))))

Semantic questions we can ask about L2 based on this interpreter definition: Is L2 lexically scoped? (This can be proven inductively.) In which environment are the values of the let computed? In which order are the variable bindings computed?

Dealing with Procedures: L3

We now introduce procedures in the object language. The basic design decision here is to make procedures "first class" objects - as they are in Scheme. This turns out to be easier to model than the restricted role used in other languages (C++, Java) which we will discuss later.

Procedures being first class objects, means that:

We can create anonymous procedures (that is, procedures are not necessarily named).
We can create procedures at runtime (that is, procedures can be computed).
We can bind procedures to variables (that is, procedures are denoted values).

The syntax for procedures we introduce is:

<expr> ::= proc ( { <var> }* ) <expr>

with the following abstract syntax:

(proc (vars (list-of symbol?))
      (body expr?))

Because of this decision, we must extend the expressed values domain. Up to L2, the expressed values domain was simply Numbers. Now, we must add the possibility that the result of a computation is a closure or a number. A closure captures the environment that was active at the time the procedure expression is computed. Consider the following example program in L3:

let x = 2 in
  let p = proc (y) +(y, x) in
    let x = 5 in
      p(10)

This reduces to: 12 (and not 15).

We therefore define a new domain of expressed values as follows - and take the opportunity to clean up the case of primitive operators (which were "hidden" expressed values up to now):

(define-datatype expval expval?
  (number (datum number?))
  (prim-proc (prim-op primitive-op?))
  (closure (formals (list-of symbol?))
           (body expr?)
           (c-env env?)))

(define primitive-op? (lambda (e) (memv? e '(+ * - /))))

We keep working with a domain of denoted values equal to the domain of expressed values.

The new evaluation rules for procedures and applications appear in bold in the following interpreter definition:

(define eval-expr
  (lambda (e env)
    (cases expr e
      (lit (datum) (number datum))       ;; Change to expval
      (prim-proc (prim-op) e)            ;; Proper handling of primitives
      (varref (var) (apply-env env var))
      (if-exp (test then else)
        (let ((tv (eval-exp test env)))
          (if (true-val? tv)
              (eval-exp then env)
              (eval-exp else env))))
      (app (rator rands) 
        (let ((proc (eval-exp rator env))
              (args (eval-rands rands env)))
          (apply-proc proc args)))
      (let-exp (vars vals body)
        (let ((expvals (eval-rands vals env)))
          (eval-exp body
                    (extend-env vars expvals env))))
      (proc (vars body)
        (closure vars body env)))))

(define (eval-rands rands env)
  (map (lambda (rand) (eval-exp rand env)) rands))

(define (apply-proc proc expvals)
  (cases proc
    (prim-proc (prim-op) (apply-prim-op prim-op expvals))
    (closure   (vars body c-env)
      (eval-exp body
                (extend-env vars expvals c-env)))))

(define (apply-prim-op op args)
  (case prim-op
    ((+) (+ (car args) (cadr args)))
    ((-) (- (car args) (cadr args)))
    ((*) (* (car args) (cadr args)))
    ((/) (/ (car args) (cadr args)))))

Last modified Mar 10, 2003 Michael Elhadad