Edit this page on GitHub

STT 996 – High Dimensional Probability

Lecture 05 (Reconstructed Notes)

Source: Handwritten notes

Chapter 3 — Subgaussian Norms and Concentration

Setup

Let

\[X = (X_k)_{1 \le k \le n}\]

be a random vector with:

independent coordinates
subgaussian entries
normalized:

\[\mathbb{E}[X_k^2] = 1\]

Define the subgaussian norm:

\[K = \max_{1 \le k \le n} |X_k|_{\psi_2}\]

Norm Concentration

We are interested in:

\[|X|*2 = \left( \sum*{k=1}^n X_k^2 \right)^{1/2}\]

First moment

\[\mathbb{E}[|X|*2^2] = \mathbb{E}\left[\sum*{k=1}^n X_k^2\right] = \sum_{k=1}^n \mathbb{E}[X_k^2] = n\]

So heuristically:

\[|X|_2 \approx \sqrt{n}\]

Concentration Inequality

The key result:

\[\mathbb{P}\left( \left| |X|_2 - \sqrt{n} \right| > t \right) \le 2 \exp\left(-c \frac{t^2}{K^4}\right), \quad t > 0\]

Interpretation

The Euclidean norm concentrates sharply around $\sqrt{n}$
This is a high-dimensional phenomenon
Variance scales like $O(1)$ despite dimension $n$

Bernstein Inequality (Subexponential Case)

Let $(X_i)_{i=1}^n$ be independent, mean-zero, subexponential:

\[\mathbb{E}[X_i] = 0\]

Then:

\[\mathbb{P}\left( \left| \frac{1}{n}\sum_{i=1}^n X_i \right| > t \right) \le 2 \exp\left(-c n \min\left(\frac{t^2}{K^2}, \frac{t}{L}\right)\right)\]

where:

\[L = \max_i |X_i|_{\psi_1}\]

Key Reduction Trick

We apply Bernstein to:

\[Y_k = X_k^2 - 1\]

Why?

$\mathbb{E}[X_k^2] = 1$
So $Y_k$ is mean-zero

Important fact (missing in lecture, but crucial):

If $X$ is subgaussian, then:

\[X^2 \text{ is subexponential}\]

and:

\[|X^2|*{\psi_1} \le C |X|*{\psi_2}^2\]

So:

\[|Y_k|_{\psi_1} \lesssim K^2\]

Apply Bernstein

We get:

\[\mathbb{P}\left( \left| \frac{1}{n} \sum_{k=1}^n (X_k^2 - 1) \right| > t \right) \le 2 \exp\left(-c n \min\left(\frac{t^2}{K^4}, \frac{t}{K^2}\right)\right)\]

Translate Back to Norm

Since:

\[|X|*2^2 = \sum*{k=1}^n X_k^2\]

we obtain:

\[\frac{1}{n}|X|_2^2 \approx 1\]

From squared norm to norm

Use:

\[|y^2 - 1| \ge \delta \quad \Rightarrow \quad |y - 1| \gtrsim \delta\]

More precisely:

\[|y - 1| \ge \delta \Rightarrow |y^2 - 1| \ge \delta\]

This step was handwavy in lecture, but it’s converting:

concentration of $ X _2^2$
into concentration of $ X _2$

Final Result

Let $t = \delta \sqrt{n}$, then:

\[\mathbb{P}\left( \left| |X|_2 - \sqrt{n} \right| > t \right) \le 2 \exp\left(-c \frac{t^2}{K^4}\right)\]

Geometry of High Dimensions

Unit Sphere

\[S^{n-1} = { x \in \mathbb{R}^n : |x|_2 = 1 }\]

Scaled sphere:

\[\sqrt{n} S^{n-1} = { x : |x|_2 = \sqrt{n} }\]

Gaussian Example

Let:

\[Z \sim \mathcal{N}(0, I_n)\]

Then:

\[Z = (Z_1, \dots, Z_n), \quad Z_k \sim \mathcal{N}(0,1)\]

and:

\[|Z|*2^2 = \sum*{k=1}^n Z_k^2 \sim \chi^2_n\]

Thus:

\[|Z|_2 \approx \sqrt{n}\]

Density

Gaussian density:

\[f(z) = (2\pi)^{-n/2} \exp\left(-\frac{|z|_2^2}{2}\right)\]

Key High-Dimensional Insight

As $n \to \infty$:

Volume of unit ball $\to 0$
Mass concentrates near sphere of radius $\sqrt{n}$

This is the thin-shell phenomenon.

Covariance and PCA

Second Moment Matrix

For $X \in \mathbb{R}^n$:

\[\Sigma = \mathbb{E}[XX^T]\]

Covariance Matrix

Let:

\[\mu = \mathbb{E}[X]\]

Then:

\[\operatorname{Cov}(X) = \mathbb{E}[(X-\mu)(X-\mu)^T] = \mathbb{E}[XX^T] - \mu \mu^T\]

Properties

1. Symmetry

\[\Sigma = \Sigma^T\]

2. Positive semidefinite

\[v^T \Sigma v \ge 0 \quad \forall v \in \mathbb{R}^n\]

Proof:

\[v^T \Sigma v = \mathbb{E}[(v^T X)^2] \ge 0\]

3. Trace identity

\[\mathbb{E}[|X|_2^2] = \operatorname{tr}(\Sigma)\]

4. Independent copy identity

If $X, Y$ i.i.d.:

\[\mathbb{E}[\langle X, Y \rangle^2] = |\Sigma|_F^2\]

(This step was rushed in lecture, but it’s using independence + conditioning.)

Spectral Decomposition (PCA)

Since $\Sigma$ is symmetric PSD:

eigenvalues:

\[\lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_n \ge 0\]

eigenvectors:

\[\Sigma u_k = \lambda_k u_k\]

with orthonormal basis:

\[\langle u_i, u_j \rangle = \delta_{ij}\]

Matrix form

Let:

\[U = [u_1 \cdots u_n], \quad D = \operatorname{diag}(\lambda_1,\dots,\lambda_n)\]

Then:

\[\Sigma = U D U^T\]

and:

\[U^T U = I\]

Expansion

For any vector:

\[x = \sum_{k=1}^n \langle x, u_k \rangle u_k\]

Thus:

\[\Sigma = \sum_{k=1}^n \lambda_k u_k u_k^T\]

Higher powers

\[\Sigma^2 = \sum_{k=1}^n \lambda_k^2 u_k u_k^T\]

Big Picture (What Your Professor Skipped)

1. Why subgaussian?

Because:

controls tails
enables concentration inequalities
stable under sums

2. Why square trick?

We convert:

\[|X|_2^2 = \sum X_k^2\]

into sum of independent variables, enabling Bernstein.

3. Geometry insight

High-dimensional vectors:

live near a sphere
not near the origin
not uniformly spread

4. PCA connection

Covariance:

defines geometry of data
eigenvectors = directions of variation
eigenvalues = magnitude of variation

Mental Model

Think:

Subgaussian → “Gaussian-like tails”
$ X _2 \approx \sqrt{n}$ → “mass on sphere”
$\Sigma$ → “shape of cloud”
Eigenvectors → “principal axes”

Edit this page on GitHub

STT 996 – High Dimensional Probability

Lecture 05 (Reconstructed Notes)

Chapter 3 — Subgaussian Norms and Concentration

Setup

Norm Concentration

First moment

Concentration Inequality

Interpretation

Bernstein Inequality (Subexponential Case)

Key Reduction Trick

Important fact (missing in lecture, but crucial):

Apply Bernstein

Translate Back to Norm

From squared norm to norm

Final Result

Geometry of High Dimensions

Unit Sphere

Gaussian Example

Density

Key High-Dimensional Insight

Covariance and PCA

Second Moment Matrix

Covariance Matrix

Properties

1. Symmetry

2. Positive semidefinite

3. Trace identity

4. Independent copy identity

Spectral Decomposition (PCA)

Matrix form

Expansion

Higher powers

Big Picture (What Your Professor Skipped)

1. Why subgaussian?

2. Why square trick?

3. Geometry insight

4. PCA connection

Mental Model

Comments