Kolmogorov complexity Part I
George Barmpalias - UESTC - Chengdu, Dec 3, 2023
Contents for Lecture 1
- What is program size complexity
- What is Algorithmic Information theory
- Machines, Programs and Numberings
- Universal machines and Invariance
- Kolmogorov complexity
- Quantitative estimates and properties
- Berry's paradox and undecidability
- Conditional complexity
- Conservationof information
- Self-delimiting programs and complexity

Program-size complexity
Aim: Formalize and quantify the complexity of (finite) data.
How many bits of information do these strings have? $$\texttt{abababababababababababababababab}$$ $$\texttt{4c1j5b2p0cv4w1x8rx2y39umgw5q85s7}$$
The choice of alphabet is not important. Binary is standard.
In unix-based systems (including mac): xxd -b FILE

Information theory (Shannon)
-
strings as outcomes of an random process
-
entropy with respect to a probability distribution
-
Information content of outcome $i$ with probability $p_i$ is $\log p_i$
Shannon entropy is the expected (average) information per symbol:
$$H:=\sum_i p_i \log p_i$$
Entropy measures the amount of surprise for the next symbol.
Pros and Cons
-
Backbone of modern communications and digital encoding
-
information-compression (source-coding theorem)
-
Entropy is the average length of code per word in the signal
-
Error-correcting, communication over noisy channels etc.
Drawbacks - Criticism (Kolmogorov, Solomonoff)
-
depends on the underlying assumed distribution of the data
-
this may be unknown, or even non-existent
-
measures average amount of information per letter
Kolmogorov complexity
How many bits of information do these strings have? $$\texttt{abababababababababababababababab}$$ $$\texttt{4c1j5b2p0cv4w1x8rx2y39umgw5q85s7}$$
Definition. The complexity $C(\sigma)$ of string $\sigma$ is the length of the smallest program that prints $\sigma$.
function GenerateString1()
return "ab" × 16
function GenerateString2()
return "4c1j5b2p0cv4w1x8rx2y39umgw5q85s7"
Kolmogorov complexity
How many bits of information do these strings have? $$\texttt{abababababababababababababababab}$$ $$\texttt{4c1j5b2p0cv4w1x8rx2y39umgw5q85s7}$$
Definition. The complexity $C(\sigma)$ of string $\sigma$ is the length of the smallest program that prints $\sigma$.
How many bits of information do these strings have? $$\texttt{59265358979323846264338327950288}$$ $$\texttt{21001798141066463695774083871573}$$
Which programming language?
Kolmogorov complexity is based on the theory of computation.
The standard abstraction of a computer is the Turing machine.
Alternatively, use any Turing-complete programming language.
Quiz. Which of the following languages are Turing-complete?
- python
- HTML
- Solidity (Etherium)
- Bitcoin scripting language
- Javascript
- HTML5
Program-size depends on the language at hand.
Extreme examples
Code golf is a recreational programming competition.
Goal is to achieve the shortest source-code that solves a given problem.
Golfscript for the first 1000 digits of $\pi$
;''
6666,-2%{2+.2/@*\/10.3??2*+}*
`1000<~\;
Whitespace code for the first 1000 digits of $\pi$*
Theory of computation
Program-size depends on the language at hand.
The length of the program depends on the language $M$.
Facts from computability:
-
there is a program that enumerates all Turing machines
-
there exists a universal Turing machine $U$
-
$U$ can simulate any Turing machine, with a constant overhead
Invariance
Let $C_M(\sigma)$ be the Kolmogorov complexity with respect to $M$: $$C_M(\sigma):=\min{|\tau| : M(\tau)=\sigma}$$
Theorem. For every machine $M$ there exists a constant $c$ such that $$C_U(\sigma) \leq C_M(\sigma)+c$$
Fix a univeral machine $U$ and let $C(\sigma):=C_U(\sigma)$.
Kolmogorov complexity is:
invariant of the choice of the universal machine, up to a constant.
Features of Kolmogorov complexity
-
does not require a distribution (model, hypothesis)
-
gives a universal a priori distribution (Bayesian inference)
-
formal framework for artificial general intelligence (Solomonoff)
-
it is incomputable but can only be approximated
-
machine-invariant but huge constants matter in practice
How many bits of information do these strings have? $$\texttt{\scriptsize 1637374017199159701353736070558761985836294941139158650322685731114837878}$$ $$\texttt{\scriptsize 1415926535897932384626433832795028841971693993751058209749445923078164062}$$
Quantitative estimates
-
$C(\sigma)\leq |\sigma| +O(1)$
-
$C(0^n)\leq \log n +O(1)$
-
$C(\sigma\sigma)\leq C(\sigma) +O(1)$
Theorem. If $f$ is computable, $C(f(\sigma)) \leq^{+} C(\sigma)$.
Does the converse hold?
Theorem. $|{ \sigma |\ |\sigma|=n\ \wedge\ C(\sigma)\leq C(0^n)+d }|=O(2^d)$.
Incompressibility
Definition. A string $\sigma$ is incompressible if $C(\sigma)\geq |\sigma|$.
Theorem. The set of compressible strings is computably enumerable.
Theorem. For each $n$ there exists an $n$-bit incompressible string.
Proof. There exist $2^n$ many strings of length $n$ and $$1+\cdots + 2^{n-1}=2^n-1$$ many strings of length $< n$.
Some $n$-bit string must fail to have a description of length $< n$.
Incompressible strings are often called random.
Can you compute Kolmogorov complexity?
Naive attempt to compute $C(s)$
-
Order the programs in increasing length
-
find the first with output $s$.
function KolmogorovComplexity(string s) for i = 1 to infinity: for each string p of length exactly i if isValidProgram(p) and evaluate(p) == s return i
Berry paradox
The smallest positive integer not definable in under sixty letters.
-
There are finitely many sentences of less than sixty letters
-
finitely many numbers can be defined in this way
-
there exists a least $m$ that cannot be define in this way
But the above sentence appears to define $m$.
Theorem. It is undecidable whether a string is incompressible.
Proof. For a contradiction, assume otherwise.
Program $M$ on $n$ outputs the lexicographically least incompressible $n$-bit string $\sigma_n$.
Then $C_M(\sigma_n)= \log n + O(1)$. So $C(\sigma_n)\leq \log n + O(1)$.
This contradicts $n\leq C(\sigma_n) + O(1)$. $\blacktriangleleft$
Chaitin on Berry paradox (article)
Conditional complexity
$C(\sigma\ |\ \tau)$ is the information in $\sigma$ given $\tau$: $$C(\sigma\ |\ \tau)=\min{|p| | U(\tau, p)=\sigma}$$ The shortest program which, given $\tau$ can print $\sigma$.
Theorem. If $A$ is finite $\forall\tau\ \exists\ \sigma\in A :\ C(\sigma\ |\ \tau)\geq \log |A|$.
Theorem. Suppose $B$ is an infinite computably enumerable set of pairs of strings with all projections $$B_{\tau}:={\sigma : (\sigma,\tau)\in B}$$ finite. Then $\forall \tau\ \forall \sigma\in B_{\tau}\ C(\sigma\ |\ \tau)\leq^{+} \log |B_{\tau}|$.
Shortest program
Let $\sigma^\ast$ denote the shortest program for $\sigma$. Then $C(\sigma)=|\sigma^\ast|$.
Theorem. The following hold:
-
$C(\sigma, C(\sigma)) =^{+} C(\sigma^{\ast})$.
-
$C(\sigma^{\ast}\ |\ \sigma) =^{+} C(C(\sigma)\ |\ \sigma)$.
Proof. From $\sigma, C(\sigma)$ we can compute $\sigma^{\ast}$
From $\sigma^{\ast}$ we know $C(\sigma)=|\sigma^{\ast}|$ and $\sigma=U(\sigma^{\ast})$. $\blacktriangleleft$
Theorem. The following hold:
-
$C(\sigma\tau) \leq^+ C(\sigma, \tau) \leq^+ C(\sigma) + C(\tau) + 2 \log C(\sigma)$.
-
$\forall d\ \exists\ \sigma,\tau:\ \ C(\sigma\tau) > C(\sigma) + C(\tau) + d$.
Conservation of information ?
This is weird... it should not happen.
How can $\sigma\tau$ have more information than $C(\sigma)+C(\tau)$ ?
Explanation: A program $\tau$ for $\sigma$
-
carries information in its digits, but also in its length $|\tau|$
-
this can make $C(\sigma)$ smaller than it should be.
By restricting the underlying machines to:
-
Self-delimiting (one-way reading of the the input-tape)
-
or exuivalently, prefix-free
we obtain a refined complexity $K(\sigma)$.
Self-delimiting programs
-
$K(\sigma\tau)\leq^{+} K(\sigma)+K(\tau)$
-
$C(\sigma) \leq^{+} K(\sigma)$
Halting probability: $$\Omega:=\sum_{U(\tau)\downarrow} 2^{-|\tau|}$$
gives a probability distribution on the strings:
$$P(\sigma) = \sum_{U(\tau)\downarrow = \sigma} 2^{-|\tau|}$$
where $\sum_{\sigma} P(\sigma) = \Omega$.