Kolmogorov complexity Part I

George Barmpalias - UESTC - Chengdu, Dec 3, 2023

Contents for Lecture 1

Program-size complexity

Aim: Formalize and quantify the complexity of (finite) data.

How many bits of information do these strings have? $$\texttt{abababababababababababababababab}$$ $$\texttt{4c1j5b2p0cv4w1x8rx2y39umgw5q85s7}$$

The choice of alphabet is not important. Binary is standard.

In unix-based systems (including mac): xxd -b FILE

Information theory (Shannon)

Shannon entropy is the expected (average) information per symbol:

$$H:=\sum_i p_i \log p_i$$

Entropy measures the amount of surprise for the next symbol.

Pros and Cons

Drawbacks - Criticism (Kolmogorov, Solomonoff)

Kolmogorov complexity

How many bits of information do these strings have? $$\texttt{abababababababababababababababab}$$ $$\texttt{4c1j5b2p0cv4w1x8rx2y39umgw5q85s7}$$

Definition. The complexity $C(\sigma)$ of string $\sigma$ is the length of the smallest program that prints $\sigma$.

function GenerateString1()
    return "ab" × 16

function GenerateString2()
    return "4c1j5b2p0cv4w1x8rx2y39umgw5q85s7"

Kolmogorov complexity

How many bits of information do these strings have? $$\texttt{abababababababababababababababab}$$ $$\texttt{4c1j5b2p0cv4w1x8rx2y39umgw5q85s7}$$

Definition. The complexity $C(\sigma)$ of string $\sigma$ is the length of the smallest program that prints $\sigma$.

How many bits of information do these strings have? $$\texttt{59265358979323846264338327950288}$$ $$\texttt{21001798141066463695774083871573}$$

Which programming language?

Kolmogorov complexity is based on the theory of computation.

The standard abstraction of a computer is the Turing machine.

Alternatively, use any Turing-complete programming language.

Quiz. Which of the following languages are Turing-complete?

Program-size depends on the language at hand.

Extreme examples

Code golf is a recreational programming competition.

Goal is to achieve the shortest source-code that solves a given problem.

Golfscript for the first 1000 digits of $\pi$

	;''
	6666,-2%{2+.2/@*\/10.3??2*+}*
	`1000<~\;

Whitespace code for the first 1000 digits of $\pi$*

Theory of computation

Program-size depends on the language at hand.

The length of the program depends on the language $M$.

Facts from computability:

Invariance

Let $C_M(\sigma)$ be the Kolmogorov complexity with respect to $M$: $$C_M(\sigma):=\min{|\tau| : M(\tau)=\sigma}$$

Theorem. For every machine $M$ there exists a constant $c$ such that $$C_U(\sigma) \leq C_M(\sigma)+c$$

Fix a univeral machine $U$ and let $C(\sigma):=C_U(\sigma)$.

Kolmogorov complexity is:

invariant of the choice of the universal machine, up to a constant.

Features of Kolmogorov complexity

How many bits of information do these strings have? $$\texttt{\scriptsize 1637374017199159701353736070558761985836294941139158650322685731114837878}$$ $$\texttt{\scriptsize 1415926535897932384626433832795028841971693993751058209749445923078164062}$$

Quantitative estimates

Theorem. If $f$ is computable, $C(f(\sigma)) \leq^{+} C(\sigma)$.

Does the converse hold?

Theorem. $|{ \sigma |\ |\sigma|=n\ \wedge\ C(\sigma)\leq C(0^n)+d }|=O(2^d)$.

Incompressibility

Definition. A string $\sigma$ is incompressible if $C(\sigma)\geq |\sigma|$.

Theorem. The set of compressible strings is computably enumerable.

Theorem. For each $n$ there exists an $n$-bit incompressible string.

Proof. There exist $2^n$ many strings of length $n$ and $$1+\cdots + 2^{n-1}=2^n-1$$ many strings of length $< n$.

Some $n$-bit string must fail to have a description of length $< n$.

Incompressible strings are often called random.

Can you compute Kolmogorov complexity?

Naive attempt to compute $C(s)$

Berry paradox

The smallest positive integer not definable in under sixty letters.

But the above sentence appears to define $m$.

Theorem. It is undecidable whether a string is incompressible.

Proof. For a contradiction, assume otherwise.

Program $M$ on $n$ outputs the lexicographically least incompressible $n$-bit string $\sigma_n$.

Then $C_M(\sigma_n)= \log n + O(1)$. So $C(\sigma_n)\leq \log n + O(1)$.

This contradicts $n\leq C(\sigma_n) + O(1)$. $\blacktriangleleft$

Berry paradox (wiki)

Chaitin on Berry paradox (article)

Conditional complexity

$C(\sigma\ |\ \tau)$ is the information in $\sigma$ given $\tau$: $$C(\sigma\ |\ \tau)=\min{|p| | U(\tau, p)=\sigma}$$ The shortest program which, given $\tau$ can print $\sigma$.

Theorem. If $A$ is finite $\forall\tau\ \exists\ \sigma\in A :\ C(\sigma\ |\ \tau)\geq \log |A|$.

Theorem. Suppose $B$ is an infinite computably enumerable set of pairs of strings with all projections $$B_{\tau}:={\sigma : (\sigma,\tau)\in B}$$ finite. Then $\forall \tau\ \forall \sigma\in B_{\tau}\ C(\sigma\ |\ \tau)\leq^{+} \log |B_{\tau}|$.

Shortest program

Let $\sigma^\ast$ denote the shortest program for $\sigma$. Then $C(\sigma)=|\sigma^\ast|$.

Theorem. The following hold:

Proof. From $\sigma, C(\sigma)$ we can compute $\sigma^{\ast}$

From $\sigma^{\ast}$ we know $C(\sigma)=|\sigma^{\ast}|$ and $\sigma=U(\sigma^{\ast})$. $\blacktriangleleft$

Theorem. The following hold:

Conservation of information ?

This is weird... it should not happen.

How can $\sigma\tau$ have more information than $C(\sigma)+C(\tau)$ ?

Explanation: A program $\tau$ for $\sigma$

By restricting the underlying machines to:

we obtain a refined complexity $K(\sigma)$.

Self-delimiting programs

Halting probability: $$\Omega:=\sum_{U(\tau)\downarrow} 2^{-|\tau|}$$

gives a probability distribution on the strings:

$$P(\sigma) = \sum_{U(\tau)\downarrow = \sigma} 2^{-|\tau|}$$

where $\sum_{\sigma} P(\sigma) = \Omega$.