STT 996 — High Dimensional Probability
Lecture 09 — Grothendieck, Kernel Trick, SVD, Min–Max
1. Grothendieck Identity
Let \(g \sim \mathcal{N}(0, I_n), \quad u, v \in S^{n-1}\)
Then: \(\mathbb{E}[\mathrm{sign}(\langle g, u \rangle)\mathrm{sign}(\langle g, v \rangle)] = \frac{2}{\pi} \sin^{-1}(\langle u, v \rangle)\)
🔍 Explanation
- This is a nonlinear correlation identity for Gaussian projections.
- Key idea: projecting onto random Gaussian direction converts inner products into a nonlinear function.
- This is the backbone of:
- random hyperplane rounding
- Grothendieck inequality
- kernel embeddings
2. Grothendieck Inequality
Let $a_{ij} \in \mathbb{R}$, $1 \le i \le m$, $1 \le j \le n$.
Assume: \(\left| \sum_{i,j} a_{ij} x_i y_j \right| \le 1 \quad \forall x_i, y_j \in \{-1,1\}\)
Then for any unit vectors $u_i, v_j \in \mathcal{H}$ (Hilbert space), \(\|u_i\| = \|v_j\| = 1\)
we have: \(\left| \sum_{i,j} a_{ij} \langle u_i, v_j \rangle \right| \le K_G\)
where: \(K_G \le \frac{\pi}{2 \ln(1+\sqrt{2})} \approx 1.783\)
🔍 Explanation
- Replacing $\pm1$ scalars with vectors increases expressiveness.
- Grothendieck says:
this only increases the supremum by a universal constant.
- This is dimension-free.
3. Kernel Trick Construction
Goal: represent nonlinear functions of inner products as inner products in a Hilbert space.
Let: \(f(x) = \sum_{k=0}^\infty a_k x^k\)
If this converges for $x \in [-1,1]$, then:
There exists a Hilbert space $\mathcal{H}$ and map $\phi$ such that \(f(\langle u, v \rangle) = \langle \phi(u), \phi(v) \rangle_{\mathcal{H}}\)
Construction of Feature Map
Define: \(\phi(u) = \bigoplus_{k=0}^\infty \sqrt{a_k}\, u^{\otimes k}\)
\[\psi(v) = \bigoplus_{k=0}^\infty \mathrm{sign}(a_k)\sqrt{|a_k|}\, v^{\otimes k}\]Then: \(\langle \phi(u), \psi(v) \rangle = \sum_{k=0}^\infty a_k \langle u, v \rangle^k = f(\langle u, v \rangle)\)
🔍 Explanation
- This is the RKHS construction behind kernels.
- Tensor powers encode higher-order interactions.
- This is why kernels = inner products in infinite-dimensional spaces.
4. Special Case: $f(x) = \sin(c x)$
Power series: \(\sin(cx) = \sum_{k=1}^\infty \frac{(-1)^{k-1}}{(2k-1)!} (cx)^{2k-1}\)
We want: \(\sum |a_k| = 1\)
This leads to normalization condition: \(\frac{e^c - e^{-c}}{2} = 1\)
(using $\sinh(c)$)
Solve: \(e^{2c} - 1 = 2e^c\)
Let $x = e^c$: \(x^2 - 2x - 1 = 0\)
\[x = 1 + \sqrt{2}\]Thus: \(c = \ln(1+\sqrt{2})\)
🔍 Why this matters
This choice gives: \(\sin(c \langle u, v \rangle) \approx \langle \phi(u), \phi(v) \rangle\)
Plug into Grothendieck identity: \(\mathbb{E}[\mathrm{sign}(\langle g,u\rangle)\mathrm{sign}(\langle g,v\rangle)] = \frac{2}{\pi} \sin^{-1}(\langle u,v\rangle)\)
→ gives the constant: \(K_G = \frac{\pi}{2c}\)
5. Tensor / Hilbert Space Interpretation
We embed: \(u \mapsto u^{\otimes k} \in \mathbb{R}^{n^k}\)
Interpretation:
- $k=1$: linear features
- $k=2$: pairwise interactions
- $k=3$: triple interactions
6. Singular Value Decomposition (SVD)
Let $A \in \mathbb{R}^{m \times n}$.
Then: \(A = \sum_{i=1}^r s_i u_i v_i^\top\)
where:
- $s_1 \ge s_2 \ge \cdots \ge s_r \ge 0$
- ${u_i}$ orthonormal in $\mathbb{R}^m$
- ${v_i}$ orthonormal in $\mathbb{R}^n$
Key Properties
(1) $A^T A$ is PSD
\(x^\top A^T A x = \|Ax\|^2 \ge 0\)
(2) Eigenvalues
\(A^T A v_i = s_i^2 v_i\)
(3) Left singular vectors
\(u_i = \frac{Av_i}{s_i}\)
Matrix Form
\[A = U S V^\top\]- $U^\top U = I$
- $V^\top V = I$
- $S = \mathrm{diag}(s_1, \dots, s_r)$
7. Relationship Between $A^TA$ and $AA^T$
- $A^TA$ has eigenvalues $s_i^2$
- $AA^T$ has eigenvalues $s_i^2$
8. Min–Max Theorem (Courant–Fischer)
Let $A$ symmetric.
Then:
Version (1)
\(\lambda_k(A) = \max_{\dim(E)=k} \min_{x \in E, \|x\|=1} x^\top A x\)
Version (2)
\(\lambda_k(A) = \min_{\dim(E)=n-k+1} \max_{x \in E, \|x\|=1} x^\top A x\)
🔁 Trick: Deriving (2) from (1)
Apply (1) to $-A$:
Eigenvalues: \(-\lambda_n \ge \cdots \ge -\lambda_1\)
Then reorder to obtain second form.
9. Proof Sketch Insight
Take: \(E = \mathrm{span}\{u_1, \dots, u_k\}\)
Let: \(x = \sum_{i=1}^k a_i u_i, \quad \sum a_i^2 = 1\)
Then: \(x^\top A x = \sum_{i=1}^k \lambda_i a_i^2 \ge \lambda_k\)
Thus: \(\min_{x \in E} x^\top A x = \lambda_k\)
10. Singular Value Interpretation
Define: \(s_k(A) = \max_{\dim(E)=k} \min_{x \in E, \|x\|=1} \|Ax\|\)
🔍 Key Insight
- Eigenvalues → quadratic form
- Singular values → operator norm
🧠 Big Picture (What Professor Skipped)
- Grothendieck = bridge between:
- discrete optimization (±1)
- continuous geometry (Hilbert space)
-
Kernel trick = infinite-dimensional linearization
-
SVD = geometry of linear maps
- Min–max = variational characterization of spectrum
🚨 Survival Guide Notes
- Grothendieck identity = Gaussian → arcsin transform
- Kernel trick = power series → tensor embedding
- SVD = spectral theorem for rectangular matrices
- Min–max = optimization view of eigenvalues
Comments