Regular Expressions

The family of regular languages (also called or type ) is the simplest language family.

  • Regular expressions can be ambiguous if there are multiple structurally different derivations that result in the same sentence. Sufficient conditions for ambiguity exist.
  • Regular expressions can be extended with operators like union, concatenation, and star.
  • A language on an alphabet is regular if it can be defined by a regular expression. So only if it is defined by concatenation, union and star over the elementary languages of ().
  • Regular expressions have limits, as they cannot represent certain languages, such as those with unbalanced nesting or varying numbers of elements. To represent these languages, generative grammars must be used instead.

Regex

Regex basics:

  • x the x character
  • . any character except newline
  • [xyz] means x or y or z
  • [a-z] any character between a and z
  • [^a-z] any character except those between a and z

Said R a regular expression:

  • RS concatenation of R and S
  • R|S either R or S
  • R* zero or more occurrences of R
  • R+ one or more occurrences of R
  • R? zero or one occurrence of R
  • R{m,n} a number or R occurrences ranging from n to m
  • R{n,} at least n occurrences
  • R{n} exactly n occurrences of R