Introduction
We formalize the notion of a compression ecology—a system designed to produce compressible descriptions of emergent behavior while maintaining adaptive capacity. The goal is to characterize when a complex system admits useful low-dimensional representations without sacrificing its ability to respond to novel inputs.
The central tension: high entropy regions enable adaptation and exploration, while low entropy foundations enable communication and reasoning. We seek formal conditions under which these can coexist stably.
Preliminaries
Entropy and Compression
Let $\mathcal{X}$ be a state space and $p: \mathcal{X} \to [0,1]$ a probability distribution over states. The Shannon entropy is:
$$H(p) = -\sum_{x \in \mathcal{X}} p(x) \log p(x)$$For a dynamical system with state trajectory $(x_t)_{t \geq 0}$, we consider the predictive information:
$$I_{\text{pred}}(T) = I(x_{0:T}; x_{T:\infty})$$which measures how much the past tells us about the future.
Definition 1 (Compressibility)
A system is $(\epsilon, k)$-compressible if there exists a function $\phi: \mathcal{X} \to \mathbb{R}^k$ such that for all trajectories:
$$\mathbb{E}\left[\|x_t - \psi(\phi(x_{0:t-1}))\|^2\right] \leq \epsilon$$for some decoder $\psi: \mathbb{R}^k \to \mathcal{X}$.
Layered Systems
Consider a system partitioned into layers $\mathcal{L} = \{L_1, \ldots, L_n\}$ with interfaces $I_{i,i+1}$ between adjacent layers.
Definition 2 (Entropy Stratification)
A layered system is entropy-stratified if:
$$H(L_1) \leq H(L_2) \leq \cdots \leq H(L_n)$$with the property that changes in $L_i$ propagate to $L_j$ for $j > i$ but not for $j < i$.
This captures the intuition that foundational layers (low index) should be stable, while higher layers can vary freely.
Main Results
Theorem 1 (Stability-Adaptivity Tradeoff)
For an entropy-stratified system with $n$ layers, let $\alpha_i = H(L_i)/H(L_n)$ be the relative entropy of layer $i$. If the system is $(\epsilon, k)$-compressible, then:
$$k \geq \sum_{i=1}^{n} \alpha_i \cdot \dim(I_{i-1,i})$$with equality when interfaces are informationally optimal.
Proof.
By the data processing inequality, information about $L_n$ must pass through all interfaces. The entropy stratification ensures each interface $I_{i-1,i}$ must carry at least $H(L_i) - H(L_{i-1})$ bits. Summing and applying the compression bound yields the result.
□Proposition 1 (Interface Minimality)
An interface $I_{i,i+1}$ is minimal if removing any component increases the prediction error on $L_{i+1}$ by more than $\delta$. For a compression ecology:
$$|I_{i,i+1}| = O\left(\frac{H(L_{i+1}) - H(L_i)}{\log(1/\delta)}\right)$$The Logic-Belief Boundary
Following Ryan's formulation, we distinguish two compression regimes:
- Logic: Lossless compression within the known region $\mathcal{K} \subset \mathcal{X}$
- Belief: Lossy compression in the unknown region $\mathcal{X} \setminus \mathcal{K}$
Theorem 2 (Boundary Characterization)
The optimal boundary $\partial \mathcal{K}$ between logic and belief regions satisfies:
$$\partial \mathcal{K} = \{x : I(x; \mathcal{K}) = \tau \cdot H(x | \mathcal{K})\}$$where $\tau$ is a threshold depending on the cost ratio between Type I and Type II errors.
Note
This formalizes the intuition that we should use deductive reasoning where we have sufficient information, and fall back to belief/narrative where we don't. The boundary is not fixed—it expands as $\mathcal{K}$ grows through learning.
Applications
Research Infrastructure
Consider a personal knowledge system with layers:
| Layer | Contents | Relative Entropy |
|---|---|---|
| $L_1$ | Core values, axioms | $\alpha_1 \approx 0.1$ |
| $L_2$ | Methods, interfaces | $\alpha_2 \approx 0.3$ |
| $L_3$ | Active hypotheses | $\alpha_3 \approx 0.7$ |
| $L_4$ | Raw observations | $\alpha_4 = 1.0$ |
Theorem 1 then tells us the minimal description length for the system scales with the weighted sum of interface dimensions.
Motivation Vectors
The "inspiration vectors" from the original treatment can be formalized as directions in a belief space:
$$v_{\text{motivation}} \in T_x(\mathcal{X} \setminus \mathcal{K})$$where $T_x$ denotes the tangent space at the current belief state $x$. The coordinates (pragmatic vs. ideal success) define a basis for this tangent space.
References
[1] Taleb, N. N. (2019). "The ergodicity problem in economics." Nature Physics.
[2] Glushko, R. "Stories replace probability judgements." The Discipline of Organizing.
[3] Ryan, P. (2019). "Logic creates lossless compressions..." Twitter.
[4] Mabie, M. (2019). "A Compression Ecology Between Motivation and Inspiration." COGS 160.