STT 996 — High Dimensional Probability

Lecture 09 — Grothendieck, Kernel Trick, SVD, Min–Max

1. Grothendieck Identity

Let $g \sim \mathcal{N}(0, I_n), \quad u, v \in S^{n-1}$

Then: $\mathbb{E}[\mathrm{sign}(\langle g, u \rangle)\mathrm{sign}(\langle g, v \rangle)] = \frac{2}{\pi} \sin^{-1}(\langle u, v \rangle)$

🔍 Explanation

This is a nonlinear correlation identity for Gaussian projections.
Key idea: projecting onto random Gaussian direction converts inner products into a nonlinear function.
This is the backbone of:
- random hyperplane rounding
- Grothendieck inequality
- kernel embeddings

2. Grothendieck Inequality

Let $a_{ij} \in \mathbb{R}$, $1 \le i \le m$, $1 \le j \le n$.

Assume: $\left| \sum_{i,j} a_{ij} x_i y_j \right| \le 1 \quad \forall x_i, y_j \in \{-1,1\}$

Then for any unit vectors $u_i, v_j \in \mathcal{H}$ (Hilbert space), $\|u_i\| = \|v_j\| = 1$

we have: $\left| \sum_{i,j} a_{ij} \langle u_i, v_j \rangle \right| \le K_G$

where: $K_G \le \frac{\pi}{2 \ln(1+\sqrt{2})} \approx 1.783$

🔍 Explanation

Replacing $\pm1$ scalars with vectors increases expressiveness.
Grothendieck says:

this only increases the supremum by a universal constant.
This is dimension-free.

3. Kernel Trick Construction

Goal: represent nonlinear functions of inner products as inner products in a Hilbert space.

Let: $f(x) = \sum_{k=0}^\infty a_k x^k$

If this converges for $x \in [-1,1]$, then:

There exists a Hilbert space $\mathcal{H}$ and map $\phi$ such that $f(\langle u, v \rangle) = \langle \phi(u), \phi(v) \rangle_{\mathcal{H}}$

Construction of Feature Map

Define: $\phi(u) = \bigoplus_{k=0}^\infty \sqrt{a_k}\, u^{\otimes k}$

\[\psi(v) = \bigoplus_{k=0}^\infty \mathrm{sign}(a_k)\sqrt{|a_k|}\, v^{\otimes k}\]

Then: $\langle \phi(u), \psi(v) \rangle = \sum_{k=0}^\infty a_k \langle u, v \rangle^k = f(\langle u, v \rangle)$

🔍 Explanation

This is the RKHS construction behind kernels.
Tensor powers encode higher-order interactions.
This is why kernels = inner products in infinite-dimensional spaces.

4. Special Case: $f(x) = \sin(c x)$

Power series: $\sin(cx) = \sum_{k=1}^\infty \frac{(-1)^{k-1}}{(2k-1)!} (cx)^{2k-1}$

We want: $\sum |a_k| = 1$

This leads to normalization condition: $\frac{e^c - e^{-c}}{2} = 1$

(using $\sinh(c)$)

Solve: $e^{2c} - 1 = 2e^c$

Let $x = e^c$: $x^2 - 2x - 1 = 0$

\[x = 1 + \sqrt{2}\]

Thus: $c = \ln(1+\sqrt{2})$

🔍 Why this matters

This choice gives: $\sin(c \langle u, v \rangle) \approx \langle \phi(u), \phi(v) \rangle$

Plug into Grothendieck identity: $\mathbb{E}[\mathrm{sign}(\langle g,u\rangle)\mathrm{sign}(\langle g,v\rangle)] = \frac{2}{\pi} \sin^{-1}(\langle u,v\rangle)$

→ gives the constant: $K_G = \frac{\pi}{2c}$

5. Tensor / Hilbert Space Interpretation

We embed: $u \mapsto u^{\otimes k} \in \mathbb{R}^{n^k}$

Interpretation:

$k=1$: linear features
$k=2$: pairwise interactions
$k=3$: triple interactions

6. Singular Value Decomposition (SVD)

Let $A \in \mathbb{R}^{m \times n}$.

Then: $A = \sum_{i=1}^r s_i u_i v_i^\top$

where:

$s_1 \ge s_2 \ge \cdots \ge s_r \ge 0$
${u_i}$ orthonormal in $\mathbb{R}^m$
${v_i}$ orthonormal in $\mathbb{R}^n$

Key Properties

(1) $A^T A$ is PSD

$x^\top A^T A x = \|Ax\|^2 \ge 0$

(2) Eigenvalues

$A^T A v_i = s_i^2 v_i$

(3) Left singular vectors

$u_i = \frac{Av_i}{s_i}$

Matrix Form

\[A = U S V^\top\]

$U^\top U = I$
$V^\top V = I$
$S = \mathrm{diag}(s_1, \dots, s_r)$

7. Relationship Between $A^TA$ and $AA^T$

$A^TA$ has eigenvalues $s_i^2$
$AA^T$ has eigenvalues $s_i^2$

\[s_i^2(A) = \lambda_i(A^TA) = \lambda_i(AA^T)\]

8. Min–Max Theorem (Courant–Fischer)

Let $A$ symmetric.

Then:

Version (1)

$\lambda_k(A) = \max_{\dim(E)=k} \min_{x \in E, \|x\|=1} x^\top A x$

Version (2)

$\lambda_k(A) = \min_{\dim(E)=n-k+1} \max_{x \in E, \|x\|=1} x^\top A x$

🔁 Trick: Deriving (2) from (1)

Apply (1) to $-A$:

Eigenvalues: $-\lambda_n \ge \cdots \ge -\lambda_1$

Then reorder to obtain second form.

9. Proof Sketch Insight

Take: $E = \mathrm{span}\{u_1, \dots, u_k\}$

Let: $x = \sum_{i=1}^k a_i u_i, \quad \sum a_i^2 = 1$

Then: $x^\top A x = \sum_{i=1}^k \lambda_i a_i^2 \ge \lambda_k$

Thus: $\min_{x \in E} x^\top A x = \lambda_k$

10. Singular Value Interpretation

Define: $s_k(A) = \max_{\dim(E)=k} \min_{x \in E, \|x\|=1} \|Ax\|$

🔍 Key Insight

Eigenvalues → quadratic form
Singular values → operator norm

🧠 Big Picture (What Professor Skipped)

Grothendieck = bridge between:
- discrete optimization (±1)
- continuous geometry (Hilbert space)
Kernel trick = infinite-dimensional linearization
SVD = geometry of linear maps
Min–max = variational characterization of spectrum

🚨 Survival Guide Notes

Grothendieck identity = Gaussian → arcsin transform
Kernel trick = power series → tensor embedding
SVD = spectral theorem for rectangular matrices
Min–max = optimization view of eigenvalues

Edit this page on GitHub

STT 996 — High Dimensional Probability

Lecture 09 — Grothendieck, Kernel Trick, SVD, Min–Max

1. Grothendieck Identity

🔍 Explanation

2. Grothendieck Inequality

🔍 Explanation

3. Kernel Trick Construction

Construction of Feature Map

🔍 Explanation

4. Special Case: $f(x) = \sin(c x)$

🔍 Why this matters

5. Tensor / Hilbert Space Interpretation

6. Singular Value Decomposition (SVD)

Key Properties

(1) $A^T A$ is PSD

(2) Eigenvalues

(3) Left singular vectors

Matrix Form

7. Relationship Between $A^TA$ and $AA^T$

8. Min–Max Theorem (Courant–Fischer)

Version (1)

Version (2)

🔁 Trick: Deriving (2) from (1)

9. Proof Sketch Insight

10. Singular Value Interpretation

🔍 Key Insight

🧠 Big Picture (What Professor Skipped)

🚨 Survival Guide Notes

Comments