Kolmogorov complexity

George Barmpalias

Chinese Academy of Sciences

UESTC - Chengdu, December 5, 2023

Contents of Lecture II

  • Instanteneous codes and Kraft inequality

  • Construction of prefix-free codes

  • proof of coding theorem

  • Counting theorems

  • Symmetry of information

  • incompressibility method (LLL example)

Link

Theorem. The following hold:

  • C(στ)+C(σ,τ)+C(σ)+C(τ)+2logC(σ)C(\sigma\tau) \leq^+ C(\sigma, \tau) \leq^+ C(\sigma) + C(\tau) + 2 \log C(\sigma).

  • d  σ,τ:  C(στ)>C(σ)+C(τ)+d\forall d\ \exists\ \sigma,\tau:\ \ C(\sigma\tau) > C(\sigma) + C(\tau) + d.

Conservation of information ? This is weird...

How can στ\sigma\tau have more information than C(σ)+C(τ)C(\sigma)+C(\tau) ?

Explanation: A program τ\tau for σ\sigma

  • carries information in its digits, but also in its length τ|\tau|

  • this can make C(σ)C(\sigma) smaller than it should be.

By restricting the underlying machines to:

  • Self-delimiting (one-way reading of the the input-tape)

  • or exuivalently, prefix-free

we obtain a refined complexity K(σ)K(\sigma).

Plain versus prefix-free complexity

Theorem. K(στ)+K(σ,τ)+K(σ)+K(τ)K(\sigma\tau) \leq^{+} K(\sigma,\tau) \leq^{+} K(\sigma)+K(\tau)

Theorem.

  • C(σ)=+min{n:K(σ  n)n}C(\sigma) =^{+} \min\{n : K(\sigma\ |\ n)\leq n\}

  • C(σ)=+K(σ  C(σ))C(\sigma) =^{+} K(\sigma\ |\ C(\sigma)).

Theorem. maxinK(i)=+logn+K(logn)\max_{i\leq n} K(i) =^{+} \log n + K(\log n)

C(σ)+σ   C(n)+logn   K(n)+2lognC(\sigma) \leq^{+} |\sigma| \ \wedge\ \ C(n) \leq^{+} \log n \ \wedge\ \ K(n) \leq^{+} 2\log n

K(σ)+K(σ)+σ+K(σ)K(|\sigma|) \leq^{+} K(\sigma) \leq^{+} |\sigma| + K(|\sigma|)

K(logn)+K(n)+logn+K(logn)K(\log n) \leq^{+} K(n) \leq^{+} \log n + K(\log n)

Optimal descriptions are incompressible (same for KK):

C(σ)=+C(σ)=σC(\sigma^{\ast}) =^{+} C(\sigma)=|\sigma^{\ast}|

The existence of σ\sigma with a certain property can be shown by:

  • Explicit construction and verification of the required property

  • Probabilistic method: show that the required property occurs with high or non-zero probability

  • Incompressibility method: show that the negation of the required property allows the compression of σ\sigma.

A good example: algorithmic Lovasz Local Lemma

Fortnow (Moser's proof via Kolmogorov complexity)

Moser-Tardos and Messner-Thierauf

Prefix-free codes

A set of strings is prefix-free if no member of it is a strict prefix of another member of it.

  • also called instanteneous codes or self-delimiting codes

  • form a subset of the uniquely decodable codes.

Kraft inequality

Prefix-free sets as antichains.

Theorem (Kraft).
Any prefix code with the codeword lengths i,i<n\ell_i, i< n satisfies i<n2i1\sum_{i < n} 2^{-\ell_i}\leq 1.

Proof. Map σ[0.σ,0.σ+2σ)\sigma\mapsto [0.\sigma, 0.\sigma + 2^{-|\sigma|}).

Then prefix-free sets correspond to pairwise disjoint intervals. \hfill \blacktriangleleft

About Kraft inequality and Github code

Online generation of prefix-free codes

Given requests (i)(\ell_i) with i2i1\sum_i 2^{-\ell_i}\leq 1, monitor the space via the trace:

ts:=1i<s2it_s := 1 - \sum_{i < s} 2^{-\ell_i}

The lexicographically greedy assignment of intervals produces a prefix-free code, as long as Kraft's inequality holds.

The 1s in tst_s indicate the available intervals: σ[0.σ,0.σ+2σ)\sigma \to [0.\sigma, 0.\sigma + 2^{-|\sigma|}).

Information content measures

Definition. A function II from strings to integers 0\geq 0 is an information content measure if

  • it is effectively approximable from below

  • σ2I(σ)1\sum_{\sigma} 2^{-I(\sigma)}\leq 1.

Theorem. KK is a O(1)O(1)-minimal information content measure.

Information content measures

  • are produced via online prefix-free codes

  • define a distribution (left semimeasure) on the strings.

The optimal non-decreasing information content measure is

K(n):=maxinK(i)K^{\ast}(n):=\max_{i\leq n} K(i)

The probability that the universal machine prints σ\sigma on random input is

P(σ) := U(τ)=σ2τP(\sigma)\ :=\ \sum_{U(\tau)=\sigma} 2^{-|\tau|}

Analogue (finite, algorithmic) of Shannon's source-coding theorem:

Coding Theorem.
P(σ)=×2K(σ)P(\sigma) =^{\times} 2^{-K(\sigma)} \ \ and\ \ K(σ)=+logP(σ)K(\sigma) =^{+} -\log P(\sigma)

Proof.
Clearly 2K(σ)=2σ+P(σ)2^{-K(\sigma)}=2^{-|\sigma^{\ast}|} \leq^{+} P(\sigma)

Note that σlogP(σ)\sigma\mapsto -\log P(\sigma) is an information content measure.

So K(σ)+logP(σ)K(\sigma) \leq^{+} -\log P(\sigma)\ and \ P(σ)×2K(σ)P(\sigma) \leq^{\times} 2^{-K(\sigma)}. \hfill \blacktriangleleft

The probability of nn-bit output is

P(n):=U(τ)=n2τ2K(n)P(n) := \sum_{|U(\tau)|=n} 2^{-|\tau|} \sim 2^{-K(n)}

Complexity of complexity

Sometimes Kolmogorov complexity is complex (computational depth).

Theorem.
For every nn there exists an nn-bit string σ\sigma with

K(K(σ)  σ)=+lognK(K(\sigma)\ |\ \sigma) =^{+} \log n

Proof by allocation game:

  • Player 1 picks string of high complexity

  • Player 2 describes its current complexity

  • Player 1 compresses it and picks another random string

  • Player 2 can insist with a new description of the updated complexity or describe the new string

Player 2 runs out of descritions: he fails compressing some K(σ)K(\sigma).

Kolmogorov complexity is rarely complex (random strings).

Counting theorems I

Recall K(σ)+K(σ)+σ+K(σ)K(|\sigma|) \leq^{+} K(\sigma) \leq^{+} |\sigma| + K(|\sigma|)

Theorem. The following hold:

  • max{K(σ):σ=n}=+σ+K(σ)\max\{K(\sigma) : |\sigma|=n \} =^{+} |\sigma| + K(|\sigma|)

  • {σ:σ=n   K(σ)σ+K(n)m}=O(2nm)|\{ \sigma : |\sigma|=n \ \wedge\ \ K(\sigma) \leq |\sigma| + K(n)-m \}| = O(2^{n-m})

The probability of compressing by cc bits decreases exponentially in cc.

Since n,mn,m are independent by substitution:

{σ:σ=n   K(σ)σc}=O(2nK(n)c)| \{ \sigma : |\sigma|=n \ \wedge\ \ K(\sigma) \leq |\sigma| - c \} | = O(2^{n-K(n)-c})

Counting theorems II

Theorem.
Let Dn:={σ:σ=n   U(σ)}D_n:=\{\sigma : |\sigma|=n \ \wedge\ \ U(\sigma)\downarrow\} and

  • pn:={σ:σ=n   U(σ)}p_n:=|\{\sigma : |\sigma|=n\ \wedge\ \ U(\sigma) \downarrow\}|

  • Pn:={σ:σn   U(σ)}P_n:=|\{\sigma : |\sigma|\leq n\ \wedge\ \ U(\sigma) \downarrow\}|

  • dn:={σ:K(σ)n}d_n := \{\sigma : K(\sigma)\leq n\}.

Then pnPndn2nK(n)p_n \sim P_n \sim d_n \sim 2^{n-K(n)} and

K(Dn)=+nK(D_n) =^{+} n

Corollary. The number of shortest descriptions of any object is bounded by a universal constant.

Mutual Information and Symmetry

Definition. The mutual information of σ,τ\sigma,\tau is

I(τ:σ):=K(σ)K(σ  τ)I(\tau : \sigma) := K(\sigma)-K(\sigma \ \mid\ \tau^{\ast})

Theorem. I(τ:σ)=+I(σ:τ)I(\tau : \sigma) =^{+} I(\sigma : \tau)

Follows from:

K(σ,τ) =+ K(σ)+K(τ  σ) =+ K(τ)+K(σ  τ)K(\sigma,\tau) \ =^{+} \ K(\sigma) + K(\tau\ |\ \sigma^{\ast})\ =^{+} \ K(\tau) + K(\sigma \ |\ \tau^{\ast})

Note: K(τ  σ) =+ K(τ  σ,K(σ))K(\tau\ |\ \sigma^{\ast})\ =^{+} \ K(\tau\ |\ \sigma, K(\sigma)).

Similarly:
C(σ,τ) =+ C(σ)+C(τ  σ)+O(log(C(σ,τ)))C(\sigma,\tau)\ =^{+} \ C(\sigma) + C(\tau\ |\ \sigma^\ast) + O(\log (C(\sigma,\tau))).

Theorem. K(σ,τ) =+ K(σ)+K(τ  σ)K(\sigma,\tau) \ =^{+} \ K(\sigma) + K(\tau\ |\ \sigma^{\ast}).

Proof. Clearly K(σ,τ)+K(σ)+K(τ  σ)K(\sigma,\tau) \leq^{+} K(\sigma) + K(\tau\ |\ \sigma^{\ast}). It remains to show

K(τ  σ)+K(σ,τ)K(σ)K(\tau\ |\ \sigma^{\ast}) \leq^{+} K(\sigma,\tau) - K(\sigma)

We enumerate online prefix-free code and the weight is:

2K(σ)τ2K(σ,τ)2^{K(\sigma)}\cdot \sum_{\tau} 2^{-K(\sigma,\tau)}

But I(σ) := τ2K(σ,τ)I(\sigma)\ :=\ \sum_{\tau} 2^{-K(\sigma,\tau)} is an information content measure.

So I(σ)+P(σ)=+2K(σ)I(\sigma) \leq^{+} P(\sigma) =^{+} 2^{-K(\sigma)} and

τ2K(σ,τ) + 2K(σ)\sum_{\tau} 2^{-K(\sigma,\tau)}\ \leq^{+} \ 2^{-K(\sigma)}

so the required code exists. \hfill \blacktriangleleft

Resource-bounded Kolmogorov complexity

Non-equivalent formalizations (non-robustness).

Many equalities and results:

  • fail to transfer, or need additive factors

  • depend on strong computational complexity hypotheses.

We did not cover

  • Solomonoff's theory of inductive inference

  • compressibility of infinite binary sequences

  • Resource-bounded versions of Kolmogorov complexity

  • Computability-theoretic aspects of Kolmogorov complexity

Take home - Online slides and content

Remember:

  • Incompressibility vs probabilistic: counting arguments

  • Classical versus algorithmic information theory

  • Incompressibility and Undecidability

  • Coding theorem

  • Universal distribution (on strings)

  • Counting theorems

  • Symmetry and conservation of information

http://barmpalias.net/teaching/courses/UESTC_KolmLect.html

pandoc -t beamer xm.md -H preample.tex -o xm.pdf; sleep 0.3; open xm.pdf pandoc -t beamer xm.md -o xm.pdf; sleep 0.3; open xm.pdf