STT 996 – High Dimensional Probability

Lecture 05 (Reconstructed Notes)

Source: Handwritten notes


Chapter 3 — Subgaussian Norms and Concentration

Setup

Let

\[X = (X_k)_{1 \le k \le n}\]

be a random vector with:

  • independent coordinates
  • subgaussian entries
  • normalized:
\[\mathbb{E}[X_k^2] = 1\]

Define the subgaussian norm:

\[K = \max_{1 \le k \le n} |X_k|_{\psi_2}\]

Norm Concentration

We are interested in:

\[|X|*2 = \left( \sum*{k=1}^n X_k^2 \right)^{1/2}\]

First moment

\[\mathbb{E}[|X|*2^2] = \mathbb{E}\left[\sum*{k=1}^n X_k^2\right] = \sum_{k=1}^n \mathbb{E}[X_k^2] = n\]

So heuristically:

\[|X|_2 \approx \sqrt{n}\]

Concentration Inequality

The key result:

\[\mathbb{P}\left( \left| |X|_2 - \sqrt{n} \right| > t \right) \le 2 \exp\left(-c \frac{t^2}{K^4}\right), \quad t > 0\]

Interpretation

  • The Euclidean norm concentrates sharply around $\sqrt{n}$
  • This is a high-dimensional phenomenon
  • Variance scales like $O(1)$ despite dimension $n$

Bernstein Inequality (Subexponential Case)

Let $(X_i)_{i=1}^n$ be independent, mean-zero, subexponential:

\[\mathbb{E}[X_i] = 0\]

Then:

\[\mathbb{P}\left( \left| \frac{1}{n}\sum_{i=1}^n X_i \right| > t \right) \le 2 \exp\left(-c n \min\left(\frac{t^2}{K^2}, \frac{t}{L}\right)\right)\]

where:

\[L = \max_i |X_i|_{\psi_1}\]

Key Reduction Trick

We apply Bernstein to:

\[Y_k = X_k^2 - 1\]

Why?

  • $\mathbb{E}[X_k^2] = 1$
  • So $Y_k$ is mean-zero

Important fact (missing in lecture, but crucial):

If $X$ is subgaussian, then:

\[X^2 \text{ is subexponential}\]

and:

\[|X^2|*{\psi_1} \le C |X|*{\psi_2}^2\]

So:

\[|Y_k|_{\psi_1} \lesssim K^2\]

Apply Bernstein

We get:

\[\mathbb{P}\left( \left| \frac{1}{n} \sum_{k=1}^n (X_k^2 - 1) \right| > t \right) \le 2 \exp\left(-c n \min\left(\frac{t^2}{K^4}, \frac{t}{K^2}\right)\right)\]

Translate Back to Norm

Since:

\[|X|*2^2 = \sum*{k=1}^n X_k^2\]

we obtain:

\[\frac{1}{n}|X|_2^2 \approx 1\]

From squared norm to norm

Use:

\[|y^2 - 1| \ge \delta \quad \Rightarrow \quad |y - 1| \gtrsim \delta\]

More precisely:

\[|y - 1| \ge \delta \Rightarrow |y^2 - 1| \ge \delta\]

This step was handwavy in lecture, but it’s converting:

  • concentration of $ X _2^2$
  • into concentration of $ X _2$

Final Result

Let $t = \delta \sqrt{n}$, then:

\[\mathbb{P}\left( \left| |X|_2 - \sqrt{n} \right| > t \right) \le 2 \exp\left(-c \frac{t^2}{K^4}\right)\]

Geometry of High Dimensions

Unit Sphere

\[S^{n-1} = { x \in \mathbb{R}^n : |x|_2 = 1 }\]

Scaled sphere:

\[\sqrt{n} S^{n-1} = { x : |x|_2 = \sqrt{n} }\]

Gaussian Example

Let:

\[Z \sim \mathcal{N}(0, I_n)\]

Then:

\[Z = (Z_1, \dots, Z_n), \quad Z_k \sim \mathcal{N}(0,1)\]

and:

\[|Z|*2^2 = \sum*{k=1}^n Z_k^2 \sim \chi^2_n\]

Thus:

\[|Z|_2 \approx \sqrt{n}\]

Density

Gaussian density:

\[f(z) = (2\pi)^{-n/2} \exp\left(-\frac{|z|_2^2}{2}\right)\]

Key High-Dimensional Insight

As $n \to \infty$:

  • Volume of unit ball $\to 0$
  • Mass concentrates near sphere of radius $\sqrt{n}$

This is the thin-shell phenomenon.


Covariance and PCA

Second Moment Matrix

For $X \in \mathbb{R}^n$:

\[\Sigma = \mathbb{E}[XX^T]\]

Covariance Matrix

Let:

\[\mu = \mathbb{E}[X]\]

Then:

\[\operatorname{Cov}(X) = \mathbb{E}[(X-\mu)(X-\mu)^T] = \mathbb{E}[XX^T] - \mu \mu^T\]

Properties

1. Symmetry

\[\Sigma = \Sigma^T\]

2. Positive semidefinite

\[v^T \Sigma v \ge 0 \quad \forall v \in \mathbb{R}^n\]

Proof:

\[v^T \Sigma v = \mathbb{E}[(v^T X)^2] \ge 0\]

3. Trace identity

\[\mathbb{E}[|X|_2^2] = \operatorname{tr}(\Sigma)\]

4. Independent copy identity

If $X, Y$ i.i.d.:

\[\mathbb{E}[\langle X, Y \rangle^2] = |\Sigma|_F^2\]

(This step was rushed in lecture, but it’s using independence + conditioning.)


Spectral Decomposition (PCA)

Since $\Sigma$ is symmetric PSD:

  • eigenvalues:
\[\lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_n \ge 0\]
  • eigenvectors:
\[\Sigma u_k = \lambda_k u_k\]

with orthonormal basis:

\[\langle u_i, u_j \rangle = \delta_{ij}\]

Matrix form

Let:

\[U = [u_1 \cdots u_n], \quad D = \operatorname{diag}(\lambda_1,\dots,\lambda_n)\]

Then:

\[\Sigma = U D U^T\]

and:

\[U^T U = I\]

Expansion

For any vector:

\[x = \sum_{k=1}^n \langle x, u_k \rangle u_k\]

Thus:

\[\Sigma = \sum_{k=1}^n \lambda_k u_k u_k^T\]

Higher powers

\[\Sigma^2 = \sum_{k=1}^n \lambda_k^2 u_k u_k^T\]

Big Picture (What Your Professor Skipped)

1. Why subgaussian?

Because:

  • controls tails
  • enables concentration inequalities
  • stable under sums

2. Why square trick?

We convert:

\[|X|_2^2 = \sum X_k^2\]

into sum of independent variables, enabling Bernstein.


3. Geometry insight

High-dimensional vectors:

  • live near a sphere
  • not near the origin
  • not uniformly spread

4. PCA connection

Covariance:

  • defines geometry of data
  • eigenvectors = directions of variation
  • eigenvalues = magnitude of variation

Mental Model

Think:

  • Subgaussian → “Gaussian-like tails”
  • $ X _2 \approx \sqrt{n}$ → “mass on sphere”
  • $\Sigma$ → “shape of cloud”
  • Eigenvectors → “principal axes”

Comments