STT 996 — High Dimensional Probability

Lecture 09 — Grothendieck, Kernel Trick, SVD, Min–Max


1. Grothendieck Identity

Let \(g \sim \mathcal{N}(0, I_n), \quad u, v \in S^{n-1}\)

Then: \(\mathbb{E}[\mathrm{sign}(\langle g, u \rangle)\mathrm{sign}(\langle g, v \rangle)] = \frac{2}{\pi} \sin^{-1}(\langle u, v \rangle)\)

🔍 Explanation

  • This is a nonlinear correlation identity for Gaussian projections.
  • Key idea: projecting onto random Gaussian direction converts inner products into a nonlinear function.
  • This is the backbone of:
    • random hyperplane rounding
    • Grothendieck inequality
    • kernel embeddings

2. Grothendieck Inequality

Let $a_{ij} \in \mathbb{R}$, $1 \le i \le m$, $1 \le j \le n$.

Assume: \(\left| \sum_{i,j} a_{ij} x_i y_j \right| \le 1 \quad \forall x_i, y_j \in \{-1,1\}\)

Then for any unit vectors $u_i, v_j \in \mathcal{H}$ (Hilbert space), \(\|u_i\| = \|v_j\| = 1\)

we have: \(\left| \sum_{i,j} a_{ij} \langle u_i, v_j \rangle \right| \le K_G\)

where: \(K_G \le \frac{\pi}{2 \ln(1+\sqrt{2})} \approx 1.783\)

🔍 Explanation

  • Replacing $\pm1$ scalars with vectors increases expressiveness.
  • Grothendieck says:

    this only increases the supremum by a universal constant.

  • This is dimension-free.

3. Kernel Trick Construction

Goal: represent nonlinear functions of inner products as inner products in a Hilbert space.

Let: \(f(x) = \sum_{k=0}^\infty a_k x^k\)

If this converges for $x \in [-1,1]$, then:

There exists a Hilbert space $\mathcal{H}$ and map $\phi$ such that \(f(\langle u, v \rangle) = \langle \phi(u), \phi(v) \rangle_{\mathcal{H}}\)


Construction of Feature Map

Define: \(\phi(u) = \bigoplus_{k=0}^\infty \sqrt{a_k}\, u^{\otimes k}\)

\[\psi(v) = \bigoplus_{k=0}^\infty \mathrm{sign}(a_k)\sqrt{|a_k|}\, v^{\otimes k}\]

Then: \(\langle \phi(u), \psi(v) \rangle = \sum_{k=0}^\infty a_k \langle u, v \rangle^k = f(\langle u, v \rangle)\)

🔍 Explanation

  • This is the RKHS construction behind kernels.
  • Tensor powers encode higher-order interactions.
  • This is why kernels = inner products in infinite-dimensional spaces.

4. Special Case: $f(x) = \sin(c x)$

Power series: \(\sin(cx) = \sum_{k=1}^\infty \frac{(-1)^{k-1}}{(2k-1)!} (cx)^{2k-1}\)

We want: \(\sum |a_k| = 1\)

This leads to normalization condition: \(\frac{e^c - e^{-c}}{2} = 1\)

(using $\sinh(c)$)

Solve: \(e^{2c} - 1 = 2e^c\)

Let $x = e^c$: \(x^2 - 2x - 1 = 0\)

\[x = 1 + \sqrt{2}\]

Thus: \(c = \ln(1+\sqrt{2})\)


🔍 Why this matters

This choice gives: \(\sin(c \langle u, v \rangle) \approx \langle \phi(u), \phi(v) \rangle\)

Plug into Grothendieck identity: \(\mathbb{E}[\mathrm{sign}(\langle g,u\rangle)\mathrm{sign}(\langle g,v\rangle)] = \frac{2}{\pi} \sin^{-1}(\langle u,v\rangle)\)

→ gives the constant: \(K_G = \frac{\pi}{2c}\)


5. Tensor / Hilbert Space Interpretation

We embed: \(u \mapsto u^{\otimes k} \in \mathbb{R}^{n^k}\)

Interpretation:

  • $k=1$: linear features
  • $k=2$: pairwise interactions
  • $k=3$: triple interactions

6. Singular Value Decomposition (SVD)

Let $A \in \mathbb{R}^{m \times n}$.

Then: \(A = \sum_{i=1}^r s_i u_i v_i^\top\)

where:

  • $s_1 \ge s_2 \ge \cdots \ge s_r \ge 0$
  • ${u_i}$ orthonormal in $\mathbb{R}^m$
  • ${v_i}$ orthonormal in $\mathbb{R}^n$

Key Properties

(1) $A^T A$ is PSD

\(x^\top A^T A x = \|Ax\|^2 \ge 0\)

(2) Eigenvalues

\(A^T A v_i = s_i^2 v_i\)

(3) Left singular vectors

\(u_i = \frac{Av_i}{s_i}\)


Matrix Form

\[A = U S V^\top\]
  • $U^\top U = I$
  • $V^\top V = I$
  • $S = \mathrm{diag}(s_1, \dots, s_r)$

7. Relationship Between $A^TA$ and $AA^T$

  • $A^TA$ has eigenvalues $s_i^2$
  • $AA^T$ has eigenvalues $s_i^2$
\[s_i^2(A) = \lambda_i(A^TA) = \lambda_i(AA^T)\]

8. Min–Max Theorem (Courant–Fischer)

Let $A$ symmetric.

Then:

Version (1)

\(\lambda_k(A) = \max_{\dim(E)=k} \min_{x \in E, \|x\|=1} x^\top A x\)

Version (2)

\(\lambda_k(A) = \min_{\dim(E)=n-k+1} \max_{x \in E, \|x\|=1} x^\top A x\)


🔁 Trick: Deriving (2) from (1)

Apply (1) to $-A$:

Eigenvalues: \(-\lambda_n \ge \cdots \ge -\lambda_1\)

Then reorder to obtain second form.


9. Proof Sketch Insight

Take: \(E = \mathrm{span}\{u_1, \dots, u_k\}\)

Let: \(x = \sum_{i=1}^k a_i u_i, \quad \sum a_i^2 = 1\)

Then: \(x^\top A x = \sum_{i=1}^k \lambda_i a_i^2 \ge \lambda_k\)

Thus: \(\min_{x \in E} x^\top A x = \lambda_k\)


10. Singular Value Interpretation

Define: \(s_k(A) = \max_{\dim(E)=k} \min_{x \in E, \|x\|=1} \|Ax\|\)


🔍 Key Insight

  • Eigenvalues → quadratic form
  • Singular values → operator norm

🧠 Big Picture (What Professor Skipped)

  1. Grothendieck = bridge between:
    • discrete optimization (±1)
    • continuous geometry (Hilbert space)
  2. Kernel trick = infinite-dimensional linearization

  3. SVD = geometry of linear maps

  4. Min–max = variational characterization of spectrum

🚨 Survival Guide Notes

  • Grothendieck identity = Gaussian → arcsin transform
  • Kernel trick = power series → tensor embedding
  • SVD = spectral theorem for rectangular matrices
  • Min–max = optimization view of eigenvalues

Comments