Computational Mechanics

1. Causal States
2. ϵ-Machines
3. Statistical Complexity
4. Information Theory & Complexity

Notes from "Computational Mechanics: Pattern and Prediction, Structure and Simplicity" (2000) [arXiv][pdf]

Computational Mechanics, is a mathematical framework to detect, quantify and describe the patterns and structure inherent in natural processes. Think of a process as something that generate a sequence of events or observables (or say tokens in LLM speak).

Our objective is to study or figure out what is the most optimal way to model a stochastic process. What is modeling? Modeling a process means we (i.e. the model) is able to predict the future behaviour of the process. Lets start with some concepts:

1. Causal States

To predict the future of a process, we have to look at it past. When a group of different past histories all yield the same probabilities for future events, the group is called a causal state.

More succinctly, a causal state is an equivalence class of histories that all yield the same probability distribution over observable futures.

These states record "every difference (about the past) that makes a difference (to the future)".

- Section V of the paper

2. ϵ-Machines

It is a model we get when we combine a process's causal states with the transtion probabilities between those causal states. ε machine is the ultimate, optimal way to model a process. And optimality is defined based on following results:

Maximally Prescients: ε machines are as good at predicting the future as having access to the entire infinite pass. i.e. if we know the current causal state, we don't need to know the entire past. This is the Optimal Prediction Theorem.
Minimally Complex: Compared to any other rival model that can predict the future equally well, this model requires the least amount of memory or resources.
Unique: The ε-machine is the only representation of a process that is both maximally prescient and minimally complex.

3. Statistical Complexity

Now we have a way to optimally model a process. For a process, we don't know the ε model beforehand, the objective is to learn it from the observations. How to do that? Lets leave that for later.

From the lens of computational mechanics, we can now look at randomness and complexity. In some theories, maximum complexity is assinged to completely random noise, after all such noise is the hardest to predict. But in computation mechanices, complexity measures the "degree of pattern" or structure nor randomness.

Thus complexity measures the non-random part that can be modelled and there is an amount of irreducible randomness (also called entropy rate) \(H[X_0 | X_{-\infty: 0}]\) that we cannot model. Formally, complexity can now be measured by two metrics:

Statistical Complexity \(C_\mu\): Average amount of historical memory (in bits) the process needs to retain to predict the future. It is calculated as Shanon entropy of causal states i.e. take the entropy of probability distribution of the causal states.
Excess Entropy (E): It is the amount of information about the past that can be deduced from the future i.e. observed behaviour. In other words, it is the mutual information between past and future. Since mutual information is symmetric, excess entropy is also the amount of information about the future that can be deduced from the past observed behaviour. This is different from Statistical Complexity because Excell entropy is based on the "observed" behaviour/states not the actual underlying causal states which are unobserved.

The theory of computational mechanics presented above is currently limited to stationary processes, i.e. processes that have the same properties in the past and the future.

4. Information Theory & Complexity

Computational mechanics is a Information Theory, an extension to Shannon's theory. Traditional information theory deals with communication of information, while Computational mechanics deals with how a system stores, transforms and transmits information over time.

Computational mechanics views a physical process (like a dripping faucet or weather patterns) as a sequence of symbols. To understand the "mechanics" of that system, you have to decode the information embedded in that sequence. It deals with:

Memory: How much of the past does the system need to "remember" to produce its next move?

This is measured by The Statistical Complexity.
Compression: Computational mechanics seeks the minimal model that can perfectly predict a system's behaviour, i.e. capture all predictable structure. This is the ε-machine.

Computational mechanics is named so, because it treats physical processes as doing computation. Instead of forces governing motion, it deals with how information governs a system's evolution.

Backlinks

Bayesian Mechanics of Transformers