STT 996 – High Dimensional Probability
Lecture 05 (Reconstructed Notes)
Source: Handwritten notes
Chapter 3 — Subgaussian Norms and Concentration
Setup
Let
\[X = (X_k)_{1 \le k \le n}\]be a random vector with:
- independent coordinates
- subgaussian entries
- normalized:
Define the subgaussian norm:
\[K = \max_{1 \le k \le n} |X_k|_{\psi_2}\]Norm Concentration
We are interested in:
\[|X|*2 = \left( \sum*{k=1}^n X_k^2 \right)^{1/2}\]First moment
\[\mathbb{E}[|X|*2^2] = \mathbb{E}\left[\sum*{k=1}^n X_k^2\right] = \sum_{k=1}^n \mathbb{E}[X_k^2] = n\]So heuristically:
\[|X|_2 \approx \sqrt{n}\]Concentration Inequality
The key result:
\[\mathbb{P}\left( \left| |X|_2 - \sqrt{n} \right| > t \right) \le 2 \exp\left(-c \frac{t^2}{K^4}\right), \quad t > 0\]Interpretation
- The Euclidean norm concentrates sharply around $\sqrt{n}$
- This is a high-dimensional phenomenon
- Variance scales like $O(1)$ despite dimension $n$
Bernstein Inequality (Subexponential Case)
Let $(X_i)_{i=1}^n$ be independent, mean-zero, subexponential:
\[\mathbb{E}[X_i] = 0\]Then:
\[\mathbb{P}\left( \left| \frac{1}{n}\sum_{i=1}^n X_i \right| > t \right) \le 2 \exp\left(-c n \min\left(\frac{t^2}{K^2}, \frac{t}{L}\right)\right)\]where:
\[L = \max_i |X_i|_{\psi_1}\]Key Reduction Trick
We apply Bernstein to:
\[Y_k = X_k^2 - 1\]Why?
- $\mathbb{E}[X_k^2] = 1$
- So $Y_k$ is mean-zero
Important fact (missing in lecture, but crucial):
If $X$ is subgaussian, then:
\[X^2 \text{ is subexponential}\]and:
\[|X^2|*{\psi_1} \le C |X|*{\psi_2}^2\]So:
\[|Y_k|_{\psi_1} \lesssim K^2\]Apply Bernstein
We get:
\[\mathbb{P}\left( \left| \frac{1}{n} \sum_{k=1}^n (X_k^2 - 1) \right| > t \right) \le 2 \exp\left(-c n \min\left(\frac{t^2}{K^4}, \frac{t}{K^2}\right)\right)\]Translate Back to Norm
Since:
\[|X|*2^2 = \sum*{k=1}^n X_k^2\]we obtain:
\[\frac{1}{n}|X|_2^2 \approx 1\]From squared norm to norm
Use:
\[|y^2 - 1| \ge \delta \quad \Rightarrow \quad |y - 1| \gtrsim \delta\]More precisely:
\[|y - 1| \ge \delta \Rightarrow |y^2 - 1| \ge \delta\]This step was handwavy in lecture, but it’s converting:
-
concentration of $ X _2^2$ -
into concentration of $ X _2$
Final Result
Let $t = \delta \sqrt{n}$, then:
\[\mathbb{P}\left( \left| |X|_2 - \sqrt{n} \right| > t \right) \le 2 \exp\left(-c \frac{t^2}{K^4}\right)\]Geometry of High Dimensions
Unit Sphere
\[S^{n-1} = { x \in \mathbb{R}^n : |x|_2 = 1 }\]Scaled sphere:
\[\sqrt{n} S^{n-1} = { x : |x|_2 = \sqrt{n} }\]Gaussian Example
Let:
\[Z \sim \mathcal{N}(0, I_n)\]Then:
\[Z = (Z_1, \dots, Z_n), \quad Z_k \sim \mathcal{N}(0,1)\]and:
\[|Z|*2^2 = \sum*{k=1}^n Z_k^2 \sim \chi^2_n\]Thus:
\[|Z|_2 \approx \sqrt{n}\]Density
Gaussian density:
\[f(z) = (2\pi)^{-n/2} \exp\left(-\frac{|z|_2^2}{2}\right)\]Key High-Dimensional Insight
As $n \to \infty$:
- Volume of unit ball $\to 0$
- Mass concentrates near sphere of radius $\sqrt{n}$
This is the thin-shell phenomenon.
Covariance and PCA
Second Moment Matrix
For $X \in \mathbb{R}^n$:
\[\Sigma = \mathbb{E}[XX^T]\]Covariance Matrix
Let:
\[\mu = \mathbb{E}[X]\]Then:
\[\operatorname{Cov}(X) = \mathbb{E}[(X-\mu)(X-\mu)^T] = \mathbb{E}[XX^T] - \mu \mu^T\]Properties
1. Symmetry
\[\Sigma = \Sigma^T\]2. Positive semidefinite
\[v^T \Sigma v \ge 0 \quad \forall v \in \mathbb{R}^n\]Proof:
\[v^T \Sigma v = \mathbb{E}[(v^T X)^2] \ge 0\]3. Trace identity
\[\mathbb{E}[|X|_2^2] = \operatorname{tr}(\Sigma)\]4. Independent copy identity
If $X, Y$ i.i.d.:
\[\mathbb{E}[\langle X, Y \rangle^2] = |\Sigma|_F^2\](This step was rushed in lecture, but it’s using independence + conditioning.)
Spectral Decomposition (PCA)
Since $\Sigma$ is symmetric PSD:
- eigenvalues:
- eigenvectors:
with orthonormal basis:
\[\langle u_i, u_j \rangle = \delta_{ij}\]Matrix form
Let:
\[U = [u_1 \cdots u_n], \quad D = \operatorname{diag}(\lambda_1,\dots,\lambda_n)\]Then:
\[\Sigma = U D U^T\]and:
\[U^T U = I\]Expansion
For any vector:
\[x = \sum_{k=1}^n \langle x, u_k \rangle u_k\]Thus:
\[\Sigma = \sum_{k=1}^n \lambda_k u_k u_k^T\]Higher powers
\[\Sigma^2 = \sum_{k=1}^n \lambda_k^2 u_k u_k^T\]Big Picture (What Your Professor Skipped)
1. Why subgaussian?
Because:
- controls tails
- enables concentration inequalities
- stable under sums
2. Why square trick?
We convert:
\[|X|_2^2 = \sum X_k^2\]into sum of independent variables, enabling Bernstein.
3. Geometry insight
High-dimensional vectors:
- live near a sphere
- not near the origin
- not uniformly spread
4. PCA connection
Covariance:
- defines geometry of data
- eigenvectors = directions of variation
- eigenvalues = magnitude of variation
Mental Model
Think:
- Subgaussian → “Gaussian-like tails”
-
$ X _2 \approx \sqrt{n}$ → “mass on sphere” - $\Sigma$ → “shape of cloud”
- Eigenvectors → “principal axes”
Comments