Homework 05
Joseph M. Weaver
2026-02-27
For the first three problems in this homework, the kernel is assumed to be the linear kernel
K(x,x′)=x⊤x′,x,x′∈Rd.
Problem 1: RKHS of the Linear Kernel
Let K:Rd×Rd→R be defined by
K(x,x′)=x⊤x′.
(a) Define explicitly the reproducing kernel Hilbert space (RKHS) HK associated with K and show that for every f∈H, there exists a β∈Rd such that f(x)=x⊤β. In addition, for two functions in H, f(x)=x⊤β and g(x)=x⊤α, the inner product is
⟨f,g⟩H=β⊤α.
Solution (1a)
Claim 1: For all f∈HK, there exists β∈Rd such that
f(x)=x⊤β.
Write
f(x)=i=1∑nciK(xi,x)=i=1∑ncixi⊤x=(i=1∑ncixi)⊤x.
Thus define
β=i=1∑ncixi,
so that
f(x)=x⊤β.
Claim 2: If f(x)=x⊤β and g(x)=x⊤α, then
⟨f,g⟩H=β⊤α.
Using bilinearity and the reproducing property,
⟨f,g⟩H=⟨i∑ciK(xi,⋅),j∑djK(yj,⋅)⟩H=i,j∑cidjK(xi,yj).
Since K(xi,yj)=xi⊤yj,
i,j∑cidjxi⊤yj=(i∑cixi)⊤(j∑djyj)=β⊤α.
(b) Verify the reproducing property, namely that for any f∈HK and any x∈Rd,
f(x)=⟨f,K(x,⋅)⟩HK.
Solution (1b)
Let f(x)=∑i=1nciK(xi,x). Then
⟨f,K(x,⋅)⟩HK=⟨i∑ciK(xi,⋅),K(x,⋅)⟩HK=i∑ci⟨K(xi,⋅),K(x,⋅)⟩HK.
By the reproducing property,
⟨K(xi,⋅),K(x,⋅)⟩HK=K(xi,x).
Thus,
⟨f,K(x,⋅)⟩HK=i∑ciK(xi,x)=f(x).
Problem 2: Ridge Regression and Kernel Ridge Regression
Let {(xi,yi)}i=1n be training data with xi∈Rd and yi∈R, and let λ>0 be a regularization parameter.
(a) (Ridge regression: primal form) Consider the optimization problem
β∈Rdmin{i=1∑n(yi−xi⊤β)2+λ∥β∥22}.
Derive the closed-form solution β^.
Solution (2a)
Rewrite in matrix form:
∥Y−Xβ∥22+λ∥β∥22.
Taking derivatives:
∂β∂=2X⊤(Xβ−Y)+2λβ=0.
Thus,
X⊤Xβ+λβ=X⊤Y,
so
β^=(X⊤X+λI)−1X⊤Y.
(b) (Kernel ridge regression) Let K(x,x′)=x⊤x′ and let K∈Rn×n be the kernel matrix with entries Kij=K(xi,xj). Consider
f∈HKmin{i=1∑n(yi−f(xi))2+λ∥f∥HK2}.
By the Representer Theorem, the solution has the form
f^(x)=i=1∑nαiK(xi,x).
Derive the expression for α^.
Solution (2b)
Using the representer form, write
f(xi)=(Kα)i.
Thus the objective becomes
∥Y−Kα∥22+λα⊤Kα.
Taking derivatives:
−2K(Y−Kα)+2λKα=0.
Rearranging,
(K+λI)α=Y.
Hence,
α^=(K+λI)−1Y.
(c) Show that the kernel ridge regression predictor can be written as
f^(x)=x⊤β^,
where β^ coincides with the ridge regression solution obtained in part (a).
Hint: Apply the SVD of an n×d matrix X to derive
X⊤(XX⊤+λIn)−1=(X⊤X+λId)−1X⊤.
Question 2(c)
Claim: f^(x)=x⊤β^.
Let X=UΣV⊤. Then
XX⊤=UΣV⊤VΣU⊤=UΣ2U⊤,
X⊤X=VΣU⊤UΣV⊤=VΣ2V⊤.
Now compute
X⊤(XX⊤+λIn)−1.
Using the SVD,
X⊤(XX⊤+λIn)−1=VΣU⊤(UΣ2U⊤+λUU⊤)−1.
Since U−1=U⊤, we factor:
(UΣ2U⊤+λUU⊤)−1=(U(Σ2+λI)U⊤)−1=U(Σ2+λI)−1U⊤.
Thus,
X⊤(XX⊤+λIn)−1=VΣU⊤U(Σ2+λI)−1U⊤=VΣ(Σ2+λI)−1U⊤.
Similarly,
(XX⊤+λIn)−1X⊤=(VΣ2V⊤+λVV⊤)−1VΣU⊤.
Factoring,
=(V(Σ2+λI)V⊤)−1VΣU⊤=V(Σ2+λI)−1ΣU⊤.
Since Σ and (Σ2+λI)−1 commute (both diagonal),
Σ(Σ2+λI)−1=(Σ2+λI)−1Σ.
Therefore,
X⊤(XX⊤+λIn)−1=(X⊤X+λId)−1X⊤.
Thus,
f^(x)=k(x)⊤(K+λI)−1Y=x⊤X⊤(XX⊤+λI)−1Y=x⊤(X⊤X+λI)−1X⊤Y=x⊤β^.
Problem 4
(a) Show for D=[0,1]2 and the exponential or Gaussian kernel K, the constant function f(x)=1, ∀x∈D, is not in the RKHS associated with K.
Solution (4a)
Assume for contradiction that
1=∫DK(x,x′)dx′for all x∈D.
Let x0=(0,0) and x1/2=(1/2,1/2).
Since Gaussian/exponential kernels depend on ∥x−x′∥,
points near the center have more nearby mass than points at the boundary.
Thus
∫DK(x0,x′)dx′=∫DK(x1/2,x′)dx′.
Therefore the integral cannot be constant on D, a contradiction.
Hence the constant function is not in the RKHS.
(b) Discuss how to generalize the Representer Theorem to better fit the data (i.e., to achieve smaller SSE). Derive explicit solutions if possible.
Question 4(b)
We generalize by introducing an intercept term b∈R and solve
f∈HK,b∈Rmini=1∑n(yi−f(xi)−b)2+λ∥f∥HK2.
By the Representer Theorem,
f(x)=j=1∑nαjK(x,xj).
Thus the objective becomes
∥Y−Kα−b1∥22+λα⊤Kα.
Gradient with respect to α
∇α=2(−K)(Y−Kα−b1)+2λKα.
Setting equal to zero,
−2K(Y−Kα−b1)+2λKα=0.
Rearranging,
KY=K(K+λI)α+bK1.
Gradient with respect to b
∇b=21⊤(Y−Kα−b1).
Setting equal to zero,
1⊤Y=1⊤Kα+nb.
Combine the two equations
These conditions can be written as the linear system
[K+λI1⊤10][αb]=[Y0].
Thus,
y^(x)=j=1∑nα^jK(x,xj)+b^.