Transforming grammars to \(LL(1)\)

Table of Contents

1 \(LL(1)\) Parsers (short recap)

\(LL(1)\) parsers are top-down parsers that can avoid backtracking by looking 1 token ahead of the current token they are trying to derive. This means that LL(1) grammars contain no ambiguity.

Not all languages and grammars are LL(1), but some non-LL(1) grammars can be converted to LL(1) (when they represent languages that have a LL(1) grammar representation). The algorithm for this transformation can end in an infinite loop, but whenever it terminates it always results in an LL(1) grammar for the same language as the original grammar.

2 Transformation algorithm

The following algorithm is run iteratively until the grammar converges:

  • Eliminate ambiguous epsilon-derivations
  • Eliminate left-recursion
  • Factor out ambiguous prefixes

2.1 Epsilon-derivation elimination

  1. Foreach rule \(A\) → ε which causes ambiguity
    1. Remove the rule
    2. For every Rule \(x\) = \(B\) → α where α contains K instance of the non-terminal \(A\)
      1. Remove \(x\)
      2. Add 2k rules, one for each possible combination of replacing some \(A\) with ε (deleting \(A\))

2.1.1 Example

S → AB
A → a | ε
B → CC | A | b
C → c

In this example, the rule A → ε causes ambiguity - given the input 'a', at least two parse trees can be constructed. For example:

epsilon_opt1.png epsilon_opt2.png

To solve this, we remove the rule A → ε by adding rules that represent the epsilon derivation (we actually preform the epsilon derivation "offline"):

S → AB | B
A → a
B → CC | a | b | ε
C → c

We introduced another kind of conflict (given the input 'a', do we use the rule S → AB or the rule S → B?), which will be dealt with later on in the section on ambiguous prefixes.

2.2 Left Recursion elimination

2.2.1 Direct Left Recursion

Consider rules of the form A → Aα | β, where α and β are any mix of terminals and non-terminals in the grammar and β cannot be left recursive (does not start with non-terminal A).

  1. For every rule A → Aα | β
    1. Remove the rule
    2. Add two new rules:
      1. A → β A'
      2. A' → α A' | ε

2.2.2 Indirect Left Recursion

Consider rule sets of the form:

A1 → A2 α1 | β1
A2 → A3 α2 | β2

An → A1 αn | βn

  1. Foreach derivation rule Ai → Aj α where i > j
    1. Replace Aj on the RHS (right-hand-side) with all possible derivations of Aj in the grammar
    2. Eliminate any local left-recursion introduced by the previous step

2.2.3 Example

S → A | a
A → Bc | b
B → Ab | c

Let A1=S, A2=A, A3=B:

A1 → A2 | a
A2 → A3 c | b
A3 → A2 b | c

  1. Remove indirect recursion in A3
    A1 → A2 | a
    A2 → A3 c | b
    A3 → A3 cb | bb | c
  2. Removing Direct Left Recursion A3 → A3 cb
    A3 → bbA3’ | cA3
    A3’ → cbA3’ | ε

The resulting grammar:
S → A | a
A → Bc | b
B → bbB’ | cB’
B’ → cbB’ | ε

2.3 Factorization (a.k.a. Left Factoring)

To solve problems with ambiguity due to common prefixes, one must factor them out when they lead to multiple possible parses.

2.3.1 Example

expr → subexpr | addexpr
addexpr → num + exper | num
subexpr → num - exper | num

Note that since both addexpr and subexpr derive to an expression beginning with num, the grammar contains ambiguous rules. For example, given the input 1 - 2 + 3, the parser will be unable to decide whether to apply the addexpr rule or the subexpr rule (as it will only see the token 1).

This can be solved by factoring out the common prefix num:

expr → num afterNum
afterNum → ε | + expr | - expr

Author: Lior Zur-Lotan and Avi Hayoun

Created: 2014-12-18 Thu 16:10

Emacs 24.3.1 (Org mode 8.2.10)