Semantic Model

In this lecture, we investigate the general issue of Natural Language Understanding (NLU). We start with a motivating example to illustrate the difference between linguistic domain and semantic domain.

We are interested in modeling the behavior of an intelligent device. The device interacts with its users in natural language, and performs actions as a reaction of this interaction.

For example, consider an ``intelligent music player''. It reacts to speech commands of the form ``play tracks 1 to 5''.

Semantic interpretation consists of translating natural language into useful semantic representation which can serve as the basis of intelligent behavior. The requirements on the semantic representation are:

It should be un-ambiguous
It should be possible to define a single "normal" form for any content, that is, it should be possible to reduce any expression to an equivalent expression in normal form. In other words, it should be possible to eliminate variability.
It should lead to easy computation and inference
It should allow easy interpretation of the input linguistic form by complementing it with stored or computed commonsense background knowledge

One of the key computation required on semantic representation is the incremental construction of a semantic model. The model encodes all the information known about a situation up to a given point. This information is the basis for the behavior of the device. The model does not start "empty" at the beginning of an interaction -- it includes a priori expectations based on cultural norms.

A model is logically consistent if it does not contain logical contradictions. For example, one cannot state in the same model that the music device contains currently 5 songs and also contains 10 songs.

As a consequence, one of the requirements on the semantic representation is that it should be capable of detecting such contradictions. The device should refuse to update its model if the new information it processes contradicts existing information.

This requires the possibility to perform inference on the basis of the knowledge currently encoded in the model. Semantic interpretation is, therefore, the operation of translating a natural language utterance into data that can be merged into an existing semantic model. It will fail if the utterance cannot be understood, or if the understanding of the utterance contradicts existing data in the model. In cases of such failures, people are very good at accommodating the input message and fill the holes to make it consistent.

Model and Task

The model can be selected to be more or less detailled. A detailed semantic model could describe the exact physical behavior of the device -- down to the level of electronic and electric flow in its computing circuits. A less abstract model could describe the behavior in terms of meaningful actions and state transitions in the device -- as far as they influence its behavior as it is perceived by human users. The selection of the level of abstraction of a model depends on the task we expect the device to fulfil. In our example, we have simple expectations from the device:

It contains music tracks
Each music track is identified by a sequential number (in more interesting devices, the tracks would be characterized by a title, artists, genre, and rich metadata).
The device can be idle, or in the middle of playing a playlist
A playlist is made up of an ordered list of music tracks
The device reacts to commands that can update the current playlist, start, pause, resume, stop the playing of the current playlist, forget the current playlist.

This domain analysis determines the target semantic forms we expect to obtain at the end of the semantic interpretation. That is, we say that the device understands natural language commands if it can map the incoming linguistic forms to commands that update the device state according to the expected outcome.

Implicit Data and Inference

One of the characteristics of natural language -- beyond the fact that it is ambiguous -- is that it leaves much information implicit -- that is, information is conveyed without being stated explicitly in words. The persons performing semantic interpretation complement the information extracted from linguistic form and accommodate it to merge this information into their personal memory.

There are several forms of implicitness in natural language -- in general, implicitness contributes to the efficiency of natural language in human communication. Paul Grice, one of the founders of the field of Pragmatics, proposed a classification along these dimensions:

Contents:
- said (explicit)
- implicated (implicit)
  - Conventional
  - Non-conventional
    - Conversational
      - Generalized
      - Specific
    - Non-conversational

See Grice, H.P., Further Notes on Logic and Conversation, in Cole, P. Syntax and Semantics, Vol.9: Pragmatics, Academic Press, NY, 1978, pp.113-127. (available online) and the Wikipedia entry on Paul Grice for a presentation of this analysis of implicit content in natural language.

Grice introduced a set of cooperative principles which help users produce and interpret implicit content, so that to make communication efficient. Example: Consider a sophisticated music device that can play and record. A command to the device might be: Play track 3 and record it as "Bob's Best No.1".

What information is not stated explicitly in this command?

The temporal aspect is left unspecified ("play as soon as possible").
If the device is currently playing another track when this command is processed, what should be done? Interrupt the current track and start the new track, or wait for the previous track to complete?
When should the recording start? We expect the recording to start exactly together with the playing of track 3 -- but this is not stated explicitly.
Should the playing of track 3 be performed in such a way that we can listen to it (real time playout) or should it be performed as fast as possible to enable recording (which can be done faster than realtime?).

One way to look at implicit content, is to apply interpretative strategies to understand why a specific content was uttered at a given time and place. The result of this computation (inference process) is to derive possible continuations to an utterance. Example: The dog is barking.
Possible continuations:

The dog is barking / Someone is on the road
The dog is barking / You should go check what is wrong
The dog is barking / I told you this dog is upset these days

From a computational perspective, the issues are: what knowledge is involved to activate these possible continuations? what processes are applied to select relevant continuations from non-relevant ones?

For our example, at the simplest level, we assume the device has a memory containing at least (1) what state it is (idle or playing a specific track); (2) the list of items that are currently loaded (how many items are in the player); (3) what is the current playlist; (4) what is the current playing state of the player. When a command is interpreted, the interpretation process must refer to this state as part of ``understanding'' the command. We will refer to this structured representation of what the device "knows about the situation" as its model of the world.

Knowledge Representation

A Knowledge Representation (KR) system provides a formalism and a computation system to implement services to construct and manipulate such models. The requirements on a KR system are:

Decidability - it should be decidable (an algorithm exists that can complete in finite time) to query the KR system.
Tractability - the complexity of the decision should be low so that querying is feasible with limited resources.
Uncertainty - it should be possible to express uncertainty and variable degrees of belief in facts.
Monotonicity - once facts are asserted into a model, they remain true, unless they are explicitly retracted.
Consistency - when facts that are asserted contradict existing facts or one of their possible inferences, the assertion should fail.
Expressiveness - it should be possible to express a wide range of facts and relations among them.
Completeness - all possible inferences of the facts asserted in the model should be reachable when queried.

We will focus on logic-based knowledge representation in the following, using First-Order Logic (FOL) as a basis.

Compositional Semantic Analysis

A basic computational method to perform semantic analysis of isolated sentences highlights the importance of compositionality.

Parsing with Semantics

An important concept underlying semantic analysis is to use compositional methods: the semantic interpretation of a node in the parse tree is obtained as a function of the semantic interpretation of its daughter constituents.

The following toy semantic interpreter implements this idea: Norvig's Scheme implementation of a bottom-up parser with compositional semantics.

This interpreter includes the same bottom-up parser discussed in the lecture on syntactic parsing. The addition consists of computing a semantic interpretation based on the syntactic parse tree of a sentence.

This is illustrated in the following simple grammar to handle commands given to the music player machine of the form "Play songs 1 to 5 without 3". In this CFG, we describe not only a way to construct parse trees (a root node dominates children nodes recursively down to the terminal nodes), but also a semantic representation associated to each node in the tree. This semantic representation belongs to the semantic domain. In the grammar below, we will use a simple semantic representation: we associate to any noun phrase expression a semantic value which is an ordered list of item numbers. For example, we will find that the semantic value associated to the expression ``3 and 5'' is the list containing the two item number (3 5). Similarly, the semantic value of the expression ``1 to 5 without 2'' will be the list (1 3 4 5).

In order to compute this semantic value for any syntactically well-formed expression, for each rule (LHS → RHS), we associate an additional element: a semantic function which computes the semantic value associated to the new node computed by the rule as a function of the semantic value of the daughter nodes.

For example, the rule:

    (CONJ -> and union*)

means that when a CONJ node is constructed, we compute its semantic value by computing the expression:


(semantic-value CONJ-Node) = (union* (map semantic-value (tree-children CONJ-Node)))


(define *grammar6*
  '((NP -> (NP CONJ NP) infix-funcall)
    (NP -> (N)          list)
    (NP -> (N P N)      infix-funcall)
    (N ->  (DIGIT)      identity)
    (N ->  (N DIGIT)    10*N+D)
    (P ->  to           integers)
    (CONJ -> and        union*)
    (CONJ -> without    set-diff)
    (DIGIT -> 1 1) (DIGIT -> 2 2) (DIGIT -> 3 3)
    (DIGIT -> 4 4) (DIGIT -> 5 5) (DIGIT -> 6 6)
    (DIGIT -> 7 7) (DIGIT -> 8 8) (DIGIT -> 9 9)
    (DIGIT -> 0 0)))

(define (infix-funcall arg1 function arg2)
    (function arg1 arg2))

(define (union* x y)
    (if (null? (intersection x y))
        (append x y)
        '()))

(define (intersection s1 s2)
    (cond ((null? s1) '())
          ((null? s2) '())
          ((memv (car s1) s2) (cons (car s1) (intersection (cdr s1) s2)))
          (else (intersection (cdr s1) s2))))

(define (set-difference s1 s2)
    (cond  ((null? s1) '())
           ((memv (car s1) s2) (set-difference (cdr s1) s2))
           (else (cons (car s1) (set-difference (cdr s1) s2)))))

(define (subset? s1 s2)
    (cond ((null? s1) #t)
          (else (and (memv (car s1) s2)
                     (subset? (cdr s1) s2)))))

(define (set-diff x y) (if (subset? y x) (set-difference x y) '()))

(define (10*N+D N D) (+ (* 10 N) D))

(define (identity x) x)

Here the semantics of an NP constituent "1 to 5" is a list of integers. The semantic interpretation for complex NPs is obtained by applying the function attached to conjunctions to the semantic interpretations of the NPs they conjoin.


(use *grammar5*)
;;;; 17

(meanings '(1 to 5 without 3))
;;;; ((1 2 4 5))

(meanings '(1 to 4 and 7 to 9))
;;;; ((1 2 3 4 7 8 9))

(meanings '(1 to 6 without 3 and 4))
;;;; ((1 2 4 5 6) (1 2 5 6))

(use *grammar6*)
;;;; 18

(meanings '(1 to 6 without 3 and 4))
;;;; ((1 2 5 6))

(meanings '(1 and 3 to 7 and 9 without 5 and 6))
;;;; ((1 3 4 7 9))

(meanings '(1 and 3 to 7 and 9 without 5 and 2))
;;;; ((1 3 4 6 7 9 2))

(meanings '(1 9 8 to 2 0 1))
;;;; ((198 199 200 201))

(meanings '(1 2 3))
;;;; (123 (123))

This example illustrates on a simple domain (list of numbers) the benefits of an unambiguous and normalizable semantic representation and the process of compositional semantic analysis.

We will next turn to a more general semantic representation, based on first-order logic, that fulfils many of the desirable requirements of a semantic representation.

Last modified Jan 8th, 2021