LL(1) and LR(0) parsing

1 LL(1) parsing

• First(A) = {w | A → * wB}
• Follow(A) = {w | S → * BAwC}
• Nullable(α) = {True | α →* ε}

1.2 Conflicts

• For any two productions $$A$$ → α and $$A$$ → β, if $$First(\alpha)$$ intersects with $$First(\beta)$$, this is a first-first conflict. Example: A → ab | ac
• For a non-Terminal $$A$$, if $$First(A)$$ intersects with $$Follow(A)$$ and $$Nullable(A)$$ = True, this is a first-follow conflict. Example:
B → Ac
A → c | ε

1.3 LL(1) Parsing using a pushdown automata

There are 3 components in this kind of parser:

1. The input stream (automata input of course)
2. prediction stack (the pushdown automata stack)
3. Transition table (the transition function & states of the pushdown automata)

The transition table is a function from non-Terminal $$X$$ and a token to derivation rules. It's very convenient to represent this function as a table.

There are 2 actions performed by an LL(1) parser:

1. $$Prediction(A,t)$$ - If the top of the stack is the non-terminal $$A$$, perform a prediction:
Pop $$A$$, look up cell $$[A, t]$$ in the transition table ($$t$$ is the current token in the input stream). If the cell is non empty, push the derivation onto the stack. Otherwise, there is a syntax error in the input string.
2. $$Match(a,t)$$ - If the top of the stack is a terminal $$a$$, perform a match:
If $$(t == a)$$ ($$t$$ is the current token in the input stream), consume $$t$$ (advance 1 token in the input stream). If $$t \neq a$$, there is a syntax error in the input string.

Remember LL Parsers consume the input from left-to-right.

When the prediction stack is empty the parsing terminates. If the input is completely consumed then accept it, otherwise (there are input tokens left) reject it.

1.4 Running example:

1.4.1 Grammar

S -> A, A -> aAb | c

1.5.2 Parse

input suffix Stack content Action
1 abcbb$A$ predict(A, a)
2 abcbb$aAb$ match(a, a)
3 bcbb$Ab$ predict(A, b)

As cell $$(A, b)$$ is empty, reject the input and throw a syntax error.

1.6 Constructing the transition table

1. For each rule $$A$$ → α
1. For each $$t$$ in $$first(\alpha )$$ ∪ {$$Follow(A)$$ | if α is nullable}
1. if $$Table[A, t]$$ is not empty, there is a conflict in the grammar. Return an error.
2. $$Table[A, t]$$ = $$A$$ → α

2 LR(0) parsing

2.1 Terminology

• Reduction - The opposite of derivation. For example given a production rule $$A$$ → α, if we have a sentential form (a sequance of terminals and non-terminals) like $$a\alpha b$$ then we can reduce it to $$aAb$$.
• Handle - A substring of terminals and non-terminals in the parsing process that is reduced.

When parsing a grammar $$G$$ with an initial non-terminal $$S$$, with an $$LR(k)$$ parser we will actually work on the grammar $$G'$$ where we create a new initial rule non-terminal $$S' \rightarrow S$$. This will make sure we have an end-of-input marker.

2.2 Reasoning

Until now we started with the initial nonterminal $$S$$ and tried to derive the input from it. This seems a bit un-intuitive (first thing we do when parsing an input is to completely ignore that input). LR parsers work Bottom-Up - they read the input (the bottom of the parse tree) and try to figure out what was written there (the structure of the tree). Eventually the tree is reduced to the initial non-terminal, and if at that point we reduced all the input, then the input was valid.

2.3 Properties of LR(k)

• $$LL(k) \subset LR(k)$$ (note that $$LL(k) \neq LR(k)$$). This means that any LL(k) language is also an LR(k) language.
• LR(k) have no problem with left recursion.
• LR(0) parsers are commonly used for parsing computer languages (i.e. compiling).
• Shift-reduce parsers are a very popular type of LR(0) parsers.

2.4 How LR(0) parsing works

We gather the input tokens (left-to-right) until we can reduce the suffix of the gathered tokens (remember rightmost derivation means we derive the suffix).

• Gathering a token is called "shifting".
• Reduction is the replacement of a sequence of tokens with the non-terminal that derives it.

2.5 Improper example

2.5.1 Grammar

E → E + (E) | id

Keep in mind that we are actually working on the grammar
S → E$E → E + (E) | id 2.5.2 Input a + (b + (c)) 2.5.3 Parse input suffix Stack Action 1 a + (b + (c))$   Shift
2 + (b + (c))$a Reduce 3 + (b + (c))$ E Shift
4 (b + (c))$E + Shift 5 b + (c))$ E + ( Shift
6 + (c))$E + (b Reduce 7 + (c))$ E + (E Shift
8 (c))$E + (E + Shift 9 c))$ E + (E + ( Shift
10 ))$E + (E + (c Reduce 11 ))$ E + (E + (E Shift
12 )$E + (E + (E) Reduce 13 )$ E + (E Shift
14 $E + (E) Reduce 15$ E Shift
2. E → E+(E)
3. E → id

2.7.2 The LR(0) items (simply place a dot at point in every production)

1. S → • E$2. S → E•$
3. S → E$• 4. E → • E+(E) 5. E → E• +(E) 6. E → E+• (E) 7. E → E+(• E) 8. E → E+(E• ) 9. E → E+(E)• 10. E → • id 11. E → id• 2.7.3 Creating states from Items States are composed of closures constructed from items. Initially the only closure is {S → • E$}. Next, we construct the closure like so:
Closure(I) = Closure(I) ∪ {A → • α | B → β• Aγ ∈ I}

Basically, for a non-terminal $$A$$ in $$I$$ with a • before it, add all items of the form "A → • …".

Given our example (Initial = {S → • E$}) we create the following closure: Closure({S → • E$ } = {S → • E$, E → • E+(E), E → • id} to create more closures we define a "goto" function that creates new closures. Given a closure $$I$$ and a symbol $$a$$ (terminal or non-terminal): goto(I, a) = {B → α a• β | B→α • aβ ∈ I} Basically, For every item in $$I$$ that has a • before $$a$$ we create a new closure by pushing the • one symbol forward. For instance, given our example closure and the symbol $$E$$ we get: goto({S → • E$, E → • E+(E), E → • id}, E) = {S → E• $, E → E• +(E)} Now, for each of these items we create a closure and for each of those closures we create all possible goto sets. We keep going until there are no more new states (items that are not part of a closure). Lets finish building the states: 2.7.4 Creating the transition table The table is index by state and symbol. We created the states already and the symbols are given by the grammar, now we need to create the action within the cells. The goto functions defines the transitions between the closures. Transition from state q1 to state q2 given symbol a $$\iff$$ goto(closure(q1), a) = closure(q2). • If the • is at the end of the item, this is a reduction action. • If the symbol is a non-terminal, the action for the transition is a go-to. • If the symbol is a terminal, the action is a shift Now we create the transition table: Actions go-to actions States a + ( )$ S E
0 s1           g2
1 rIII rIII rIII rIII rIII
2   s4     s3
3 acc acc acc acc acc
4     s5
5 s1           g6
6   s4   s7
7 rII rII rII rII rII

2.7.5 Running the parser

Input: a + (a + (a))

input suffix Stack Action
1 a + (a + (a))$0 s1 2 + (a + (a))$ 0a1 rIII
3 + (a + (a))$0E g2 4 + (a + (a))$ 0E2 s4
5 (a + (a))$0E2+4 s5 6 a + (a))$ 0E2+4(5 s1
7 + (a))$0E2+4(5a1 rIII 8 + (a))$ 0E2+4(5E g2
9 + (a))$0E2+4(5E2 s4 10 (a))$ 0E2+4(5E2+4 s5
11 a))$0E2+4(5E2+4(5 s1 12 ))$ 0E2+4(5E2+4(5a1 rIII
13 ))$0E2+4(5E2+4(5E g6 14 ))$ 0E2+4(5E2+4(5E6 s7
15 )$0E2+4(5E2+4(5E6)7 rII 16 )$ 0E2+4(5E g6
17 )$0E2+4(5E6 s7 18$ 0E2+4(5E6)7 rII
19 $0E g2 20$ 0E2 s3
21   0E2$3 accept It is a matter of preference whether to add an extra state q8 to the automata that represents the final reduction of E$• to S, in which case an extra couple of parsing steps would be added:

 21 0e2\$3 rI 22 0S g8 23 0S8 accept

2.8 Conflicts

There are two kinds of conflicts we encounter

1. Shift-reduce conflict - a state contains items that corespond to both reduce and shift actions
2. Reduce-reduce conflict - a state has 2 different items coresponding to different reduce actions

2.9 Indications of a conflict

Any grammar with an ε derivation cannot be LR(0). This is because there is no input to reduce, so at any point that derivation rule can be used to reduce (add the rule's LHS non-terminal to the stack)

Some of these can be resolved, but we won't cover it here.

Created: 2014-12-18 Thu 16:22

Emacs 24.3.1 (Org mode 8.2.10)

Validate