Free grammars
Regular expressions can provide lists but not nesting (recursion). A generative grammar is a set of simple rules that can be repeatedly applied to generate valid strings. The grammar defines languages through rule rewriting and the repeated application of these rules. A context-free grammar, also simply known as a free grammar or a type 2 or BNF (Backus Normal Form) grammar is defined as: where:
- is the set of non terminals symbols
- is the set of terminals symbols
- is the set of productions (sometimes are indicated as (rules))
- is the axiom, a particular symbol of
Note that the grammar is said to be “free” when the production rules apply equally to all occurrences of a nonterminal symbol in the grammar, without regard to the context in which it appears (indeed we can call them context-free). This distinguishes it from a context-sensitive grammar, where the production rules depend on the context in which the nonterminal appears.
- Recursion is the key to create infinite grammars. The necessary and sufficient condition for a grammar to be infinite is that there is a recursive derivation. Note that a grammar have a recursive derivations iff the corresponding graph has a circuit.
- A rule can be left recursive or right recursive or both
- The concept of left recursive or right recursive is quite important in order to generate a left-associative operator for example. For a left-associative operator is necessary to use left-recursive rules.
A -> A terminal | X
- Clean grammars are grammars without a circular axioms.
Classificazione regole importanti
Given a rule with ( are non-terminal symbols set) and (where is the set of terminal symbols of the grammar) we can classify the production itself according to many definitions. Some of the most important are:
-
Recursive rule on the left: the production is called a recursive rule on the left if its left part appears as a prefix of the right part
-
Recursive rule on the right The production is called a recursive rule on the right if its left part appears as a suffix of the right part
There are many normal form; one of the most important is indispensable for the top-down parsers to be studied later in the course. This normal form is termed not left-recursive and it is characterized by the absence of left-recursive rules or derivations.
Syntax tree
Given a grammar and a string in we can build a syntax tree of the string where the root is the axiom , the leafs are the sequence of the symbols of the string and all the branches are the productions used to generate the string. Two possible algorithms to build syntax trees:
- ELL: top-bottom
- ELR: bottom-up
Ambiguity
With the definition of Syntax Tree we can also specify what is an ambiguity in the context of grammars. A sentence is ambiguous if it admits different syntax trees. The number of distinct trees is also the degree of the ambiguity.
Avoid the ambiguity in the grammar design phase.
Grammar classification
Chomsky proposed a classification of grammars, thus establishing an order based on their generality and defining the “types” from 0 to 3, where type 3 represents the least general grammars and type 0 is the most general case. In this course, we will only be dealing with type 2 and 3 grammars. Each type of grammar is a subset of the one which precedes it.
Type 0 grammars
- Family of recursively enumerable languages
- Associated with Turing machines and generates recursively enumerable languages.
- The languages generated are not generally decidable.
- Rules are of the most general type.
Type 1 grammars:
- Family of context-sensitive languages
- Associated with Turing machines with tape of length equal to that of the string to be recognized.
- Generates context-sensitive languages, which are always decidable.
Type 2 grammars
- Family of unrestricted languages
- Associated with non-deterministic stack automata.
- Generates unrestricted languages, which are always decidable and which we have already extensively analyzed.
Type 3 grammars
- Family of regular languages
- Associated with finite-state automata.
- Generates regular languages.
- We can classify them to unilinear right or left
Grammar | Rule type | Language Family | Recognizer Model |
---|---|---|---|
Type 2 | with non-terminal and can be anything (non-terminal/terminal) | context-free lang / BNF lang | Pushdown Automaton (non-deterministic) |
Type 3 | or (not both) with non-terminal and B nonterm or | Regular or rational lang | Finite state automaton |
EBNF
Extended Backus Normal Form (EBNF) is just a variant grammar that contains exactly one rule (i.e. one rule for each nonterminal); each rule has a different nonterminal on the left side and has a regular expression of alphabet on the right side, in which derived operators such as cross, power and optionality can also appear.
EXPR -> IMP [imp IMP]
IMP -> TRM (or TRM )*
TRM -> FCT (and FCT)*
FCT -> [not] EQU
EQU -> OBJ ['=' OBJ]
OBJ -> "x" | "y" | "z" | "(" EXPR ")"
Dyck Language
Model of languages well-parenthesized:
& \Sigma=\{a, a^{\prime}, b, b^{\prime}\} \\ & S \rightarrow a S a^{\prime} S\left|b S b^{\prime} S\right| \varepsilon \end{aligned}$$ Dyck language is free but not regular. ## Linear language equations Linear languages can be represented as a set of linear equations. Every rule corresponds to a linear equation. The system of equations can be solved using substitution and the Arden Identity.