Formal Languages and Automata - Basic Concepts

📢 This article was translated by gemini-2.5-flash

Introduction

Sometimes it’s the people who no one imagines anything of who do the things that no one can imagine

Recently, I learned from the comments on Bitecho’s video about the documentary about Turing, The Imitation Game. I watched it and was deeply moved by this quote from the film, using it as an epigraph for this article on Formal Languages and Automata.

If you ever feel a bit down, also check out Amy’s thoughts on this quote .

Basic Concepts

Alphabet: A non-empty, finite set of symbols (characters).

$$ {\textstyle \sum_1} = \{0, 1\} $$

String: A finite sequence of symbols from some alphabet.

$$ \text{If } \sum_1 = \{0, 1\}, \text{ then } 0, 1, 00, 101011 \text{ are strings over } \sum_1. $$

Empty String: Denoted as $\varepsilon$, a string with 0 characters. For any alphabet $\sum$, $\varepsilon \notin \sum$.

Symbol Conventions

Alphabets: $\sum, \Gamma, \cdots$

Characters: $a, b, c, \cdots$

Strings: $\cdots,w,x,y,z$

Sets: $A,B,C,\cdots$

Length of a String: The number of symbol positions in the string.

$$ \text{For } \sum_1 = \{0, 1\}, |0010|=4. $$

Concatenation of Strings $x$ and $y$: An operation that joins them end-to-end to form a new string, denoted as $x \cdot y$ or $xy$.

$$ x=01, y=ab, \text{ then } xy=01ab $$

$n$-th Power of String $x$ ($n \ge 0$), recursively defined as:

$$ x^n= \left \{ \begin{matrix} \varepsilon & n=0 \\ x^{n-1}x & n \gt 0 \end{matrix} \right . $$

For $\sum_1 = \{a, b\}$, $(ba)^2=baba, ba^2=baa$.

Concatenation of Sets $A$ and $B$, denoted as $A\cdot B$ or $AB$, defined as:

$$ A \cdot B = \{ w |w=x \cdot y, x \in A \space and \space y \in B \} $$

If $A = \{0, 11\}$ and $B = \{ab, ba\}$, then $AB = \{ 0ab, 0ba, 11ab, 11ba \}, BA= \{ ab0, ab11, ba0, ba11 \}$

$n$-th Power of Set $A$ ($n \ge 0$), recursively defined as:

$$ A^n= \left \{ \begin{matrix} \{ \varepsilon \} & n=0\\ A^{n-1}A & n \ge 1 \end{matrix} \right . $$

For $\sum_1 = \{0, 1\}$, we have $\sum^0=\{\varepsilon\}, \sum^1=\{0,1\}, \sum^2=\{00,01,10,11\}, \cdots$.

Kleene Closure: The union of all powers (including zero-th power) of an alphabet.

$$ {\textstyle \sum^*} = \bigcup_{i=0}^{\infty}{\textstyle \sum^i} $$

Positive Closure: The union of all powers (excluding zero-th power) of an alphabet.

$$ {\textstyle \sum^+} = \bigcup_{i=1}^{\infty}{\textstyle \sum^i} $$

Clearly,

$$ {\textstyle \sum^*} = {\textstyle \sum^+} \cup \{ \varepsilon \} $$

Languages

Definition: If $\sum$ is an alphabet and $\forall L \subseteq \sum^*$, then $L$ is called a language over the alphabet $\sum$.

  • Natural languages, programming languages, etc.
  • $ \{ 0^n 1^n \mid n \ge 0 \} $
  • The set of strings of 0’s and 1’s with an equal number of each
  • $\emptyset$, $\{\varepsilon\}$, and $\sum^*$ are all languages over any alphabet $\sum$, but $\emptyset \ne \{\varepsilon\}$.

The only constraint for a language is that all alphabets must be finite.