STT 996 – High Dimensional Probability

Lecture 07 (Reconstructed + Expanded)

§4 Subgaussian Random Vectors

Definition (Subgaussian Vector)

A random vector $X \in \mathbb{R}^n$ is subgaussian if for all $x \in S^{n-1}$, the 1-dimensional projection $\langle X, x \rangle$ is a subgaussian random variable.

Norm Definition

The subgaussian norm of a vector is: $\|X\|_{\psi_2} := \sup_{x \in S^{n-1}} \|\langle X, x \rangle\|_{\psi_2}$

Recall (From Chapter 2)

Let ${X_k}_{k=1}^n$ be independent, centered subgaussian random variables: $\mathbb{E}[X_k] = 0$

Then: $\left\|\sum_{k=1}^n a_k X_k \right\|_{\psi_2}^2 \le C \sum_{k=1}^n a_k^2 \|X_k\|_{\psi_2}^2$

Result (Key Inequality)

Let $X = (X_1, \dots, X_n)$ with independent, centered subgaussian entries.

Then: $\|X\|_{\psi_2} \le C \cdot \max_{1 \le k \le n} \|X_k\|_{\psi_2}$

Proof Sketch (Reconstructed)

For $x \in S^{n-1}$: $\langle X, x \rangle = \sum_{k=1}^n x_k X_k$

Then: $\|\langle X, x \rangle\|_{\psi_2}^2 \le C \sum_{k=1}^n x_k^2 \|X_k\|_{\psi_2}^2$

Since: $\sum_{k=1}^n x_k^2 = 1$

We get: $\|\langle X, x \rangle\|_{\psi_2}^2 \le C \max_k \|X_k\|_{\psi_2}^2$

This bound does not depend on $x$, so taking supremum gives result.

Example: Standard Gaussian

Let: $g_n \sim \mathcal{N}(0, I_n)$

Then: $g_n = (Z_1, \dots, Z_n), \quad Z_k \overset{iid}{\sim} \mathcal{N}(0,1)$

Thus: $\|g_n\|_{\psi_2} \le C \max_k \|Z_k\|_{\psi_2} \le C$

(Each $Z_k$ is subgaussian with constant norm.)

General Gaussian Case

Let: $X \sim \mathcal{N}(0, \Sigma)$

Then: $X = \Sigma^{1/2} g_n$

Estimation

\[\|X\|_{\psi_2} = \sup_{x \in S^{n-1}} \|\langle \Sigma^{1/2} g_n, x \rangle\|_{\psi_2}\]

Rewrite: $\langle \Sigma^{1/2} g_n, x \rangle = \langle g_n, \Sigma^{1/2} x \rangle$

Thus: $\|\langle X, x \rangle\|_{\psi_2} \le C \|\Sigma^{1/2} x\|_2$

Take supremum: $\|X\|_{\psi_2} \le C \sup_{x \in S^{n-1}} \|\Sigma^{1/2} x\|_2$

But: $\sup_{x \in S^{n-1}} \|\Sigma^{1/2} x\|_2 = \sqrt{\lambda_{\max}(\Sigma)}$

So: $\boxed{ \|X\|_{\psi_2} \le C \sqrt{\lambda_{\max}(\Sigma)} }$

Example: Symmetric Bernoulli

Let: $X_k \in \{-1, +1\}, \quad P(X_k = \pm 1) = \frac{1}{2}$

Then $X = (X_1, \dots, X_n)$ is subgaussian.

Reason:

Each $X_k$ is subgaussian
Independence applies previous theorem

Uniform Distribution on the Sphere

Let: $X \sim \text{Unif}(S^{n-1})$

Goal: bound $P(|\langle X, v \rangle| > t)$

Key Trick: Gaussian Representation

Let: $g \sim \mathcal{N}(0, I_n)$

Then: $X = \frac{g}{\|g\|_2}$

Rotational Invariance Trick

Take orthogonal matrix $O$ such that: $Ov = e_1$

Then: $\langle X, v \rangle = \langle OX, Ov \rangle = \langle OX, e_1 \rangle$

But: $OX \overset{d}{=} X$

So: $\langle X, v \rangle \overset{d}{=} X_1$

Rewrite Event

\[\langle X, v \rangle = \frac{g_1}{\|g\|_2}\]

Thus: $P(\langle X, v \rangle > t) = P\left(g_1 > t \|g\|_2 \right)$

Split Norm

Let: $\|g\|_2^2 = g_1^2 + \sum_{k=2}^n g_k^2$

Then: $g_1 > t \|g\|_2 \Rightarrow g_1^2 > t^2 (g_1^2 + \sum_{k=2}^n g_k^2)$

Rearrange: $g_1^2 (1 - t^2) > t^2 \sum_{k=2}^n g_k^2$

Key Reduction

Let: $\tilde{g} = (g_2, \dots, g_n)$

Then: $g_1^2 > \frac{t^2}{1 - t^2} \|\tilde{g}\|_2^2$

Independence Trick

$g_1$ is independent of $\tilde{g}$.

So: $P(\langle X, v \rangle > t) = \mathbb{E}\left[ P\left(g_1 > \frac{t}{\sqrt{1-t^2}} \|\tilde{g}\|_2 \mid \tilde{g} \right) \right]$

Subgaussian Bound

Using Gaussian tail: $P(g_1 > s) \le e^{-s^2/2}$

So: $\le \mathbb{E}\left[ \exp\left(-\frac{t^2}{2(1-t^2)} \|\tilde{g}\|_2^2 \right) \right]$

MGF of Chi-Square

\[\|\tilde{g}\|_2^2 \sim \chi^2(n-1)\]

MGF: $\mathbb{E}[e^{-s \|\tilde{g}\|_2^2}] = (1 + 2s)^{-(n-1)/2}$

Plug in: $s = \frac{t^2}{2(1-t^2)}$

Gives: $= (1 - t^2)^{(n-1)/2}$

Final Bound

Using: $1 - x \le e^{-x}$

We get: $P(|\langle X, v \rangle| > t) \le 2 e^{-c n t^2}$

Key Conclusion

Uniform sphere is subgaussian with variance proxy $\sim 1/n$: $\|\langle X, v \rangle\|_{\psi_2} \le \frac{C}{\sqrt{n}}$

Kernel Trick (Preview)

Kernel Definition

$K: \mathcal{X} \times \mathcal{X} \to \mathbb{R}$

Examples:

Gaussian kernel: $K(u,v) = e^{-\|u - v\|^2}$
Polynomial kernel: $K(u,v) = (\langle u, v \rangle + c)^k$

Kernel Trick Idea

Find feature map: $\Phi: \mathcal{X} \to \mathcal{H}$

such that: $K(u,v) = \langle \Phi(u), \Phi(v) \rangle_{\mathcal{H}}$

Validity Condition

Kernel is valid iff: $[K(u_i, u_j)]_{i,j=1}^n \succeq 0$

RKHS Note

We embed into a Hilbert space:

Reproducing Kernel Hilbert Space (RKHS)
Avoids explicit feature construction

Grothendieck Identity (Very Important)