Kolmogorov complexity

George Barmpalias - UESTC - Chengdu, Dec 3, 2023

Contents for Lecture 1

Program-size complexity

Aim: Formalize and quantify the complexity of (finite) data.

How many bits of information do these strings have? \texttt{abababababababababababababababab} \texttt{4c1j5b2p0cv4w1x8rx2y39umgw5q85s7}

The choice of alphabet is not important. Binary is standard.

In unix-based systems (including mac): xxd -b FILE

Information theory (Shannon)

Shannon entropy is the expected (average) information per symbol:

H:=\sum_i p_i \log p_i

Entropy measures the amount of surprise for the next symbol.

Pros and Cons

Drawbacks - Critisism (Kolmogorov, Solomonoff)

Kolmogorov complexity

How many bits of information do these strings have? \texttt{abababababababababababababababab} \texttt{4c1j5b2p0cv4w1x8rx2y39umgw5q85s7}

Definition. The complexity C(\sigma) of string \sigma is the length of the smallest program that prints \sigma.

function GenerateString1()
    return "ab" × 16

function GenerateString2()
    return "4c1j5b2p0cv4w1x8rx2y39umgw5q85s7"

Kolmogorov complexity

How many bits of information do these strings have? \texttt{abababababababababababababababab} \texttt{4c1j5b2p0cv4w1x8rx2y39umgw5q85s7}

Definition. The complexity C(\sigma) of string \sigma is the length of the smallest program that prints \sigma.

How many bits of information do these strings have? \texttt{59265358979323846264338327950288} \texttt{21001798141066463695774083871573}

Which programming language?

Kolmogorov complexity is based on the theory of computation.

The standard abstraction of a computer is the Turing machine.

Alternatively, use any Turing-complete programming language.

Quiz. Which of the following languages are Turing-complete?

Program-size depends on the language at hand.

Extreme examples

Code golf is a recreational programming competition.

Goal is to achieve the shortest source-code that solves a given problem.

Golfscript for the first 1000 digits of \pi

    ;''
    6666,-2%{2+.2/@*\/10.3??2*+}*
    `1000<~\;

Whitespace code for the first 1000 digits of \pi*

Theory of computation

Program-size depends on the language at hand.

The length of the program depends on the language M.

Facts from computability:

Invariance

Let C_M(\sigma) be the Kolmogorov complexity with respect to M: C_M(\sigma):=\min\{|\tau| : M(\tau)=\sigma\}

Theorem. For every machine M there exists a constant c such that C_U(\sigma) \leq C_M(\sigma)+c

Fix a univeral machine U and let C(\sigma):=C_U(\sigma).

Kolmogorov complexity is:

invariant of the choice of the universal machine, up to a constant.

Features of Kolmogorov complexity

How many bits of information do these strings have? \texttt{\scriptsize 1637374017199159701353736070558761985836294941139158650322685731114837878} \texttt{\scriptsize 1415926535897932384626433832795028841971693993751058209749445923078164062}

Quantitative estimates

Theorem. If f is computable, C(f(\sigma)) \leq^{+} C(\sigma).

Does the converse hold?

Theorem. |\{ \sigma \|\ |\sigma|=n\ \wedge\ C(\sigma)\leq C(0^n)+d \}|=O(2^d).

Incompressibility

Definition. A string \sigma is incompressible if C(\sigma)\geq |\sigma|.

Theorem. The set of compressible strings is computably enumerable.

Theorem. For each n there exists an n-bit incompressible string.

Proof. There exist 2^n many strings of length n and 1+\cdots + 2^{n-1}=2^n-1 many strings of length < n.

Some n-bit string must fail to have a description of length < n.

Incompressible strings are often called random.

Can you compute Kolmogorov complexity?

Naive attempt to compute C(s)

Berry paradox

The smallest positive integer not definable in under sixty letters.

But the above sentence appears to define m.

Theorem. It is undecidable whether a string is incompressible.

Proof. For a contradiction, assume otherwise.

Program M on n outputs the lexicographically least incompressible n-bit string \sigma_n.

Then C_M(\sigma_n)= \log n + O(1). So C(\sigma_n)\leq \log n + O(1).

This contradicts n\leq C(\sigma_n) + O(1). \blacktriangleleft

Berry paradox (wiki)

Chaitin on Berry paradox (article)

Conditional complexity

C(\sigma\ |\ \tau) is the information in \sigma given \tau: C(\sigma\ |\ \tau)=\min\{|p| \| U(\tau, p)=\sigma\} The shortest program which, given \tau can print \sigma.

Theorem. If A is finite \forall\tau\ \exists\ \sigma\in A :\ C(\sigma\ |\ \tau)\geq \log |A|.

Theorem. Suppose B is an infinite computably enumerable set of pairs of strings with all projections B_{\tau}:=\{\sigma : (\sigma,\tau)\in B\} finite. Then \forall \tau\ \forall \sigma\in B_{\tau}\ C(\sigma\ |\ \tau)\leq^{+} \log |B_{\tau}|.

Shortest program

Let \sigma^\ast denote the shortest program for \sigma. Then C(\sigma)=|\sigma^\ast|.

Theorem. The following hold:

Proof. From \sigma, C(\sigma) we can compute \sigma^{\ast}

From \sigma^{\ast} we know C(\sigma)=|\sigma^{\ast}| and \sigma=U(\sigma^{\ast}). \blacktriangleleft

Theorem. The following hold:

Conservation of information ?

This is weird… it should not happen.

How can \sigma\tau have more information than C(\sigma)+C(\tau) ?

Explanation: A program \tau for \sigma

By restricting the underlying machines to:

we obtain a refined complexity K(\sigma).

Self-delimiting programs

Halting probability: \Omega:=\sum_{U(\tau)\downarrow} 2^{-|\tau|}

gives a probability distribution on the strings:

P(\sigma) = \sum_{U(\tau)\downarrow = \sigma} 2^{-|\tau|}

where \sum_{\sigma} P(\sigma) = \Omega.