Matrix Model

Edit this page on GitHub

Matrix Model

We model the observed matrix as:

\[\mathbf{X} = \mathbf{M} + \mathbf{E}\]

$\mathbf{M}$ = mean (coarse structure)
$\mathbf{E}$ = noise, with entrywise variance

In the simplest toy:

\[\varepsilon_{ij} \sim \mathcal{N}(0, \sigma_{ij}^2)\]

PCA Interpretation

Step 1: Fix a direction

Let $v \in \mathbb{R}^p$ be a unit vector (candidate principal direction).

Then for each row $x_i$:

\[v^\top x_i \sim \mathcal{N}(v^\top \mu_i,; v^\top \Sigma_i v)\]

Step 2: Squared projections

Consider:

\[\frac{(v^\top x_i)^2}{v^\top \Sigma_i v}\]

This follows a non-central chi-squared distribution:

\[\chi^2_1(\lambda_i), \quad \lambda_i = \frac{(v^\top \mu_i)^2}{v^\top \Sigma_i v}\]

Step 3: Summing over observations

\[\sum_i \frac{(v^\top x_i)^2}{v^\top \Sigma_i v}\]

is a sum of independent non-central chi-squared variables.

Interpretation

This gives a clean decomposition:

Mean structure → drives noncentrality
Noise / precision → scales variance
“Important directions” → maximize signal-to-noise ratio

So PCA can be reinterpreted as:

Finding directions $v$ that maximize non-centrality relative to uncertainty.

Key Insight

Precision, attention, and dimension are the same object in different languages:

Small eigenvalues → false precision
Large eigenvalues → real structure
Attention → learned importance weighting
$\sigma_{ij}$ → local precision allocation

Toy Simulation (Python)

import numpy as np
from numpy.linalg import eigh

np.random.seed(7)

# Mean matrix (coarse structure)
M = np.array([
    [200000.0, 190000.0],
    [210000.0, 205000.0],
    [195000.0, 198000.0],
    [205000.0, 202000.0],
    [198000.0, 200000.0],
    [202000.0, 201000.0],
])

# Scale-aware uncertainty (toy)
S = np.full_like(M, 10000.0)

# Top direction from mean
G = M.T @ M
evals, evecs = eigh(G)
v = evecs[:, np.argmax(evals)]
v = v / np.linalg.norm(v)

# Row-level Gaussian structure
row_means = M @ v
row_vars = np.sum((S**2) * (v[None, :]**2), axis=1)

# Noncentrality parameters
lambdas = row_means**2 / row_vars

# Monte Carlo
B = 1000
vals = []

for _ in range(B):
    X = M + np.random.normal(size=M.shape) * S
    scores = X @ v
    vals.append(np.sum(scores**2 / row_vars))

print("Empirical mean:", np.mean(vals))
print("Theoretical mean:", np.sum(1 + lambdas))

What This Is (Conceptually)

This is not standard PCA. It is closer to:

Probabilistic PCA (PPCA)
Factor analysis
Heteroscedastic noise models
Bayesian matrix factorization

But with a key twist:

Precision is scale-aware and possibly learned, not fixed.

What This Is Not

Not kernel smoothing
Not simple digit-level attention
Not just compression

It is:

A probabilistic reinterpretation of numerical precision inside linear algebra.

Future Directions / Questions

1. Scale-aware variance modeling

Replace ad hoc $\sigma_{ij}$ with:

\[\sigma_{ij} = c \cdot 10^{m_{ij}-k}\]

or learned:

\[\sigma_{ij} = f_\theta(x_{ij}, \text{context})\]

2. Weighted / heteroscedastic PCA

Standard PCA assumes:

\[\Sigma = \sigma^2 I\]

Instead:

\[\Sigma_i = \text{diag}(\sigma_{i1}^2, \dots, \sigma_{ip}^2)\]

Investigate:

weighted PCA
generalized eigenvalue problems

3. Attention as precision allocator

Replace hand-designed $\sigma_{ij}$ with:

neural network
attention mechanism

Goal:

learn which digits / scales matter
discard false precision automatically

4. Local intrinsic dimension

Estimate:

\[\text{dim}(x) \approx \text{rank}(\mathrm{Cov}(X \mid x))\]

Connect to:

manifold learning
adaptive rank models

5. Spectral stability under quantization

Test:

log-scale transforms
coarse binning
low-bit representations

Measure:

eigenvector deviation
spectral distortion

6. Link to information theory

Interpret:

$\sigma_{ij}$ ↔ noise level ↔ bits required
PCA ↔ rate–distortion tradeoff

7. Random matrix perspective

If:

\[X_{ij} \sim \mathcal{N}(\mu_{ij}, \sigma_{ij}^2)\]

then:

$X^\top X$ ~ (non-central) Wishart-like
eigenvalues follow nontrivial distributions

Study:

stability of top eigenspace
effect of heteroscedastic noise

Working Hypothesis

Real-world numerical data has lower intrinsic dimension than its floating-point representation, and this can be exposed by modeling precision explicitly rather than treating all digits equally.

Next Step

Implement heteroscedastic PCA (weighted)
Compare:
- raw PCA
- PCA on mean
- uncertainty-aware PCA
Evaluate:
- eigenvector stability
- compression vs accuracy tradeoff

Edit this page on GitHub

Matrix Model

PCA Interpretation

Step 1: Fix a direction

Step 2: Squared projections

Step 3: Summing over observations

Interpretation

Key Insight

Toy Simulation (Python)

What This Is (Conceptually)

What This Is Not

Future Directions / Questions

1. Scale-aware variance modeling

2. Weighted / heteroscedastic PCA

3. Attention as precision allocator

4. Local intrinsic dimension

5. Spectral stability under quantization

6. Link to information theory

7. Random matrix perspective

Working Hypothesis

Next Step

Comments