- What is program size complexity
- What is Algorithmic Information theory
- Machines, Programs and Numberings
- Universal machines and Invariance
- Kolmogorov complexity

- Quantitative estimates and properties
- Berry’s paradox and undecidability
- Conditional complexity
- Conservationof information
- Self-delimiting programs and complexity

**Aim:** Formalize and quantify the complexity of
(finite) data.

*How many bits of information do these strings have?* \texttt{abababababababababababababababab}
\texttt{4c1j5b2p0cv4w1x8rx2y39umgw5q85s7}

The choice of alphabet is not important. Binary is standard.

In unix-based systems (including mac): *xxd -b FILE*

strings as outcomes of an random process

*entropy*with respect to a probability distributionInformation content of outcome i with probability p_i is \log p_i

*Shannon entropy* is the expected (average) information per
symbol:

H:=\sum_i p_i \log p_i

*Entropy* measures the amount of surprise for the next
symbol.

Backbone of modern communications and digital encoding

information-compression (source-coding theorem)

Entropy is the average length of code per word in the signal

Error-correcting, communication over noisy channels etc.

**Drawbacks - Critisism (Kolmogorov, Solomonoff)**

depends on the underlying assumed distribution of the data

this may be unknown, or even non-existent

measures

**average**amount of information per letter

*How many bits of information do these strings have?* \texttt{abababababababababababababababab}
\texttt{4c1j5b2p0cv4w1x8rx2y39umgw5q85s7}

**Definition.** The complexity C(\sigma) of string \sigma is the length of the smallest program
that prints \sigma.

```
function GenerateString1()
return "ab" × 16
function GenerateString2()
return "4c1j5b2p0cv4w1x8rx2y39umgw5q85s7"
```

*How many bits of information do these strings have?* \texttt{abababababababababababababababab}
\texttt{4c1j5b2p0cv4w1x8rx2y39umgw5q85s7}

**Definition.** The complexity C(\sigma) of string \sigma is the length of the smallest program
that prints \sigma.

*How many bits of information do these strings have?* \texttt{59265358979323846264338327950288}
\texttt{21001798141066463695774083871573}

Kolmogorov complexity is based on the theory of computation.

The standard abstraction of a computer is the **Turing
machine**.

Alternatively, use any *Turing-complete* programming
language.

**Quiz.** Which of the following languages are
Turing-complete?

- python
- HTML

- Solidity (Etherium)
- Bitcoin scripting language
- Javascript

- HTML5

Program-size depends on the language at hand.

**Code golf** is a recreational programming
competition.

**Goal** is to achieve the shortest source-code that
solves a given problem.

**Golfscript** for the first 1000 digits of \pi

```
;''
6666,-2%{2+.2/@*\/10.3??2*+}*
`1000<~\;
```

**Whitespace**
code for the first 1000 digits of \pi*

Program-size depends on the language at hand.

The length of the program depends on the language M.

**Facts from computability:**

there is a program that enumerates all Turing machines

there exists a universal Turing machine U

U can simulate any Turing machine, with a constant overhead

Let C_M(\sigma) be the Kolmogorov complexity with respect to M: C_M(\sigma):=\min\{|\tau| : M(\tau)=\sigma\}

**Theorem.** For every machine M there exists a constant c such that C_U(\sigma) \leq C_M(\sigma)+c

Fix a univeral machine U and let C(\sigma):=C_U(\sigma).

Kolmogorov complexity is:

*invariant of the choice of the universal machine, up to a
constant.*

does not require a distribution (model, hypothesis)

gives a universal apriori distribution (Baysian inference)

formal framework for artificial general intelligence (Solomonoff)

it is incomputable but can only be approximated

machine-invariant but huge constants matter in practice

*How many bits of information do these strings have?* \texttt{\scriptsize
1637374017199159701353736070558761985836294941139158650322685731114837878}
\texttt{\scriptsize
1415926535897932384626433832795028841971693993751058209749445923078164062}

C(\sigma)\leq |\sigma| +O(1)

C(0^n)\leq \log n +O(1)

C(\sigma\sigma)\leq C(\sigma) +O(1)

**Theorem.** If f is
computable, C(f(\sigma)) \leq^{+}
C(\sigma).

*Does the converse hold?*

**Theorem.** |\{ \sigma \|\
|\sigma|=n\ \wedge\ C(\sigma)\leq C(0^n)+d \}|=O(2^d).

**Definition.** A string \sigma is *incompressible* if C(\sigma)\geq |\sigma|.

**Theorem.** The set of compressible strings is
computably enumerable.

**Theorem.** For each n
there exists an n-bit incompressible
string.

*Proof.* There exist 2^n many
strings of length n and 1+\cdots + 2^{n-1}=2^n-1 many strings of
length < n.

Some n-bit string must fail to have a description of length < n.

Incompressible strings are often called *random*.

**Naive attempt to compute C(s)**

Order the programs in increasing length

find the first with output s.

`function KolmogorovComplexity(string s) for i = 1 to infinity: for each string p of length exactly i if isValidProgram(p) and evaluate(p) == s return i`

*The smallest positive integer not definable in under sixty
letters.*

There are finitely many sentences of less than sixty letters

finitely many numbers can be defined in this way

there exists a least m that cannot be define in this way

But the above sentence appears to define m.

**Theorem.** It is undecidable whether a string is
incompressible.

*Proof.* For a contradiction, assume otherwise.

Program M on n outputs the lexicographically least incompressible n-bit string \sigma_n.

Then C_M(\sigma_n)= \log n + O(1). So C(\sigma_n)\leq \log n + O(1).

This contradicts n\leq C(\sigma_n) + O(1). \blacktriangleleft

*Chaitin
on Berry paradox (article)*

C(\sigma\ |\ \tau) is the information in \sigma given \tau: C(\sigma\ |\ \tau)=\min\{|p| \| U(\tau, p)=\sigma\} The shortest program which, given \tau can print \sigma.

**Theorem.** If A is
finite \forall\tau\ \exists\ \sigma\in A :\
C(\sigma\ |\ \tau)\geq \log |A|.

**Theorem.** Suppose B
is an infinite computably enumerable set of pairs of strings with all
projections B_{\tau}:=\{\sigma :
(\sigma,\tau)\in B\} finite. Then \forall \tau\ \forall \sigma\in B_{\tau}\ C(\sigma\
|\ \tau)\leq^{+} \log |B_{\tau}|.

Let \sigma^\ast denote the shortest program for \sigma. Then C(\sigma)=|\sigma^\ast|.

**Theorem.** The following hold:

C(\sigma, C(\sigma)) =^{+} C(\sigma^{\ast}).

C(\sigma^{\ast}\ |\ \sigma) =^{+} C(C(\sigma)\ |\ \sigma).

*Proof.* From \sigma,
C(\sigma) we can compute \sigma^{\ast}

From \sigma^{\ast} we know C(\sigma)=|\sigma^{\ast}| and \sigma=U(\sigma^{\ast}). \blacktriangleleft

**Theorem.** The following hold:

C(\sigma\tau) \leq^+ C(\sigma, \tau) \leq^+ C(\sigma) + C(\tau) + 2 \log C(\sigma).

\forall d\ \exists\ \sigma,\tau:\ \ C(\sigma\tau) > C(\sigma) + C(\tau) + d.

This is weird… it should not happen.

How can \sigma\tau have more information than C(\sigma)+C(\tau) ?

**Explanation:** A program \tau for \sigma

carries information in its digits, but also in its length |\tau|

this can make C(\sigma) smaller than it should be.

By restricting the underlying machines to:

Self-delimiting (one-way reading of the the input-tape)

or exuivalently, prefix-free

we obtain a refined complexity K(\sigma).

K(\sigma\tau)\leq^{+} K(\sigma)+K(\tau)

C(\sigma) \leq^{+} K(\sigma)

Halting probability: \Omega:=\sum_{U(\tau)\downarrow} 2^{-|\tau|}

gives a **probability distribution** on the strings:

P(\sigma) = \sum_{U(\tau)\downarrow = \sigma} 2^{-|\tau|}

where \sum_{\sigma} P(\sigma) = \Omega.