Kolmogorov complexity




CS154

Chris Pollett

May 11, 2011

Outline

Information

Descriptions via TMs

Minimal Description Length

Definition. Let `x` be a binary string. The minimal description of `x`, `d(x)`, is the lexicographical first shortest string `langle M, w rangle` such that `M` on input `w` halts with `x` on the tape. We define `K(x) = |d(x)|`.

Theorem. `exists c forall x[K(x) leq |x| + c]`.

Proof. Let `M` be the machine that halts as soon as it starts. Then `langle M, x rangle` describes the string `x`. The length of `|M|` is some number `c`. From which we get that `K(x) leq |langle M, x rangle| leq |x| + c` as desired.

Some Inequalities Involving Minimal Description Length

Theorem. `exists c forall x[K(\x\x) leq K(x) + c]`.

Proof. Let `d(x) = langle M, w rangle` be a minimal description of `x`. Then `langle N, langle M,w rangle rangle` describes `\x\x`, where `N` is the machine which on input `langle M, w rangle` runs `M` on `w`, then writes the output of the simulation twice.

Using the same kind of idea one can show:

Theorem. `exists c forall x,y [K(xy) leq 2K(x) + K(y) + c]`.

Optimality of the Definition

Theorem. `forall x[K(x) leq K_p(x) + c]`.

Proof. Consider the machine `M` which on input `w` simulates the programming language `p` on input `w`, then outputs what that programming language would output. So `langle M, d_p(x) rangle` outputs `x` and this string is at most constantly longer than `K_p(x)`.

Incompressible Strings

Definition. Let `x` be a string. Say that `x` is `c`-compressible if `K(x) leq |x| - c`. If `x` is not `c`-compressible, we say that it is incompressible by `c`. If `x` is not `1`-compressible, we say that it is incompressible.

Theorem. Incompressible strings of every length exist.

Proof. The number of strings of length `n` is `2^n`. Each description is a binary string, so the number of descriptions of length less than `n` is at most the sum of the number of strings of each length up to `n-1`, or `1+2+4+8 + cdots + 2^{n-1} =2^n-1` which is less than the number of strings of length `n`. So some incompressible string of length `n` must exist.

Almost-all Properties and Incompressible Strings

We next show that computable properties that hold for almost all strings hold for all but finitely many incompressible strings. By holds for almost all strings, we mean the fraction of strings of length `n` on which it fails to hold approaches `0` as `n` gets larger.

Theorem. Let `f` be a computable property that holds for almost all strings. Then for any `b > 0`, the property `f` is FALSE on only finitely many strings that are incompressible by `b`.

Proof. Consider the following `M`:

`M = `"On input `i`, a binary integer:

  1. Find the `i`th string `s` where `f(s) =` FALSE, considering the strings ordered lexicographically.
  2. Output string `s`."

For any string `x`, let `i_x` be the position of `x` on a list of strings that don't have property `f`. So `langle M, i_x rangle` is a description of `x` and it has length `|i_x| + c` where `c= |langle M rangle|`. Fix `b > 0`. Choose `n` such that `1/2^{b+c+1}` fraction of strings of length `n` or less fail to have property `f`. By our definition of "holds for almost all strings" we can find such an `n`. Let `x` be a string of length at most `n` that fails to have property `f` (if it doesn't exists then property `f` holds of all `b`-incompressible strings of length `n` and we're done). As there are at most `2^{n+1} - 1` strings of length `n` or less, so
`i_x \leq (2^{n+1} -1)/(2^{b+c+1}) leq 2^{n-b-c}`.
Therefore, `|i_x| leq n - b - c` so `langle M, i_x rangle` is at most `(n-b-c) +c = n-b` and so `K(x) \leq n -b ` so `x` couldn't have been `b`-incompressible.