STT 996 – High Dimensional Probability
Lecture 07 (Reconstructed + Expanded)
§4 Subgaussian Random Vectors
Definition (Subgaussian Vector)
A random vector $X \in \mathbb{R}^n$ is subgaussian if for all $x \in S^{n-1}$, the 1-dimensional projection \(\langle X, x \rangle\) is a subgaussian random variable.
Norm Definition
The subgaussian norm of a vector is: \(\|X\|_{\psi_2} := \sup_{x \in S^{n-1}} \|\langle X, x \rangle\|_{\psi_2}\)
Recall (From Chapter 2)
Let ${X_k}_{k=1}^n$ be independent, centered subgaussian random variables: \(\mathbb{E}[X_k] = 0\)
Then: \(\left\|\sum_{k=1}^n a_k X_k \right\|_{\psi_2}^2 \le C \sum_{k=1}^n a_k^2 \|X_k\|_{\psi_2}^2\)
Result (Key Inequality)
Let $X = (X_1, \dots, X_n)$ with independent, centered subgaussian entries.
Then: \(\|X\|_{\psi_2} \le C \cdot \max_{1 \le k \le n} \|X_k\|_{\psi_2}\)
Proof Sketch (Reconstructed)
For $x \in S^{n-1}$: \(\langle X, x \rangle = \sum_{k=1}^n x_k X_k\)
Then: \(\|\langle X, x \rangle\|_{\psi_2}^2 \le C \sum_{k=1}^n x_k^2 \|X_k\|_{\psi_2}^2\)
Since: \(\sum_{k=1}^n x_k^2 = 1\)
We get: \(\|\langle X, x \rangle\|_{\psi_2}^2 \le C \max_k \|X_k\|_{\psi_2}^2\)
This bound does not depend on $x$, so taking supremum gives result.
Example: Standard Gaussian
Let: \(g_n \sim \mathcal{N}(0, I_n)\)
Then: \(g_n = (Z_1, \dots, Z_n), \quad Z_k \overset{iid}{\sim} \mathcal{N}(0,1)\)
Thus: \(\|g_n\|_{\psi_2} \le C \max_k \|Z_k\|_{\psi_2} \le C\)
(Each $Z_k$ is subgaussian with constant norm.)
General Gaussian Case
Let: \(X \sim \mathcal{N}(0, \Sigma)\)
Then: \(X = \Sigma^{1/2} g_n\)
Estimation
\[\|X\|_{\psi_2} = \sup_{x \in S^{n-1}} \|\langle \Sigma^{1/2} g_n, x \rangle\|_{\psi_2}\]Rewrite: \(\langle \Sigma^{1/2} g_n, x \rangle = \langle g_n, \Sigma^{1/2} x \rangle\)
Thus: \(\|\langle X, x \rangle\|_{\psi_2} \le C \|\Sigma^{1/2} x\|_2\)
Take supremum: \(\|X\|_{\psi_2} \le C \sup_{x \in S^{n-1}} \|\Sigma^{1/2} x\|_2\)
But: \(\sup_{x \in S^{n-1}} \|\Sigma^{1/2} x\|_2 = \sqrt{\lambda_{\max}(\Sigma)}\)
So: \(\boxed{ \|X\|_{\psi_2} \le C \sqrt{\lambda_{\max}(\Sigma)} }\)
Example: Symmetric Bernoulli
Let: \(X_k \in \{-1, +1\}, \quad P(X_k = \pm 1) = \frac{1}{2}\)
Then $X = (X_1, \dots, X_n)$ is subgaussian.
Reason:
- Each $X_k$ is subgaussian
- Independence applies previous theorem
Uniform Distribution on the Sphere
Let: \(X \sim \text{Unif}(S^{n-1})\)
Goal: bound \(P(|\langle X, v \rangle| > t)\)
Key Trick: Gaussian Representation
Let: \(g \sim \mathcal{N}(0, I_n)\)
Then: \(X = \frac{g}{\|g\|_2}\)
Rotational Invariance Trick
Take orthogonal matrix $O$ such that: \(Ov = e_1\)
Then: \(\langle X, v \rangle = \langle OX, Ov \rangle = \langle OX, e_1 \rangle\)
But: \(OX \overset{d}{=} X\)
So: \(\langle X, v \rangle \overset{d}{=} X_1\)
Rewrite Event
\[\langle X, v \rangle = \frac{g_1}{\|g\|_2}\]Thus: \(P(\langle X, v \rangle > t) = P\left(g_1 > t \|g\|_2 \right)\)
Split Norm
Let: \(\|g\|_2^2 = g_1^2 + \sum_{k=2}^n g_k^2\)
Then: \(g_1 > t \|g\|_2 \Rightarrow g_1^2 > t^2 (g_1^2 + \sum_{k=2}^n g_k^2)\)
Rearrange: \(g_1^2 (1 - t^2) > t^2 \sum_{k=2}^n g_k^2\)
Key Reduction
Let: \(\tilde{g} = (g_2, \dots, g_n)\)
Then: \(g_1^2 > \frac{t^2}{1 - t^2} \|\tilde{g}\|_2^2\)
Independence Trick
$g_1$ is independent of $\tilde{g}$.
So: \(P(\langle X, v \rangle > t) = \mathbb{E}\left[ P\left(g_1 > \frac{t}{\sqrt{1-t^2}} \|\tilde{g}\|_2 \mid \tilde{g} \right) \right]\)
Subgaussian Bound
Using Gaussian tail: \(P(g_1 > s) \le e^{-s^2/2}\)
So: \(\le \mathbb{E}\left[ \exp\left(-\frac{t^2}{2(1-t^2)} \|\tilde{g}\|_2^2 \right) \right]\)
MGF of Chi-Square
\[\|\tilde{g}\|_2^2 \sim \chi^2(n-1)\]MGF: \(\mathbb{E}[e^{-s \|\tilde{g}\|_2^2}] = (1 + 2s)^{-(n-1)/2}\)
Plug in: \(s = \frac{t^2}{2(1-t^2)}\)
Gives: \(= (1 - t^2)^{(n-1)/2}\)
Final Bound
Using: \(1 - x \le e^{-x}\)
We get: \(P(|\langle X, v \rangle| > t) \le 2 e^{-c n t^2}\)
Key Conclusion
Uniform sphere is subgaussian with variance proxy $\sim 1/n$: \(\|\langle X, v \rangle\|_{\psi_2} \le \frac{C}{\sqrt{n}}\)
Kernel Trick (Preview)
Kernel Definition
\(K: \mathcal{X} \times \mathcal{X} \to \mathbb{R}\)
Examples:
-
Gaussian kernel: \(K(u,v) = e^{-\|u - v\|^2}\)
-
Polynomial kernel: \(K(u,v) = (\langle u, v \rangle + c)^k\)
Kernel Trick Idea
Find feature map: \(\Phi: \mathcal{X} \to \mathcal{H}\)
such that: \(K(u,v) = \langle \Phi(u), \Phi(v) \rangle_{\mathcal{H}}\)
Validity Condition
Kernel is valid iff: \([K(u_i, u_j)]_{i,j=1}^n \succeq 0\)
RKHS Note
We embed into a Hilbert space:
- Reproducing Kernel Hilbert Space (RKHS)
- Avoids explicit feature construction
Grothendieck Identity (Very Important)
Let: \(g \sim \mathcal{N}(0, I_n)\)
Then: \(\mathbb{E}\left[\operatorname{sign}(\langle g,u\rangle)\operatorname{sign}(\langle g,v\rangle)\right] = \frac{2}{\pi} \arcsin(\langle u,v \rangle)\)
Why This Works
Use rotational invariance:
WLOG: \(u = (1,0,\dots), \quad v = (v_1, v_2, 0,\dots)\)
Then: \(\langle g,u \rangle = g_1\)
\[\langle g,v \rangle = v_1 g_1 + v_2 g_2\]This reduces to a 2D Gaussian problem.
Key Insight
The expectation depends only on: \(\langle u,v \rangle = \cos(\theta)\)
So geometry fully determines behavior.
Big Picture Takeaways
- Subgaussian vectors are controlled via 1D projections
- Independence → clean norm bounds
- Gaussian + orthogonal invariance = powerful reduction tool
- Uniform sphere ≈ Gaussian normalized
- Kernel trick = geometry in disguise
- Grothendieck identity connects probability ↔ geometry
Comments