2022-Q2 — Least Squares Projection, Orthogonality, and Conditional Expectation

2022 Probability Prelim Exam (PDF)

This problem tests the geometric structure underlying least squares estimation, beginning with a finite-dimensional model
$X_{a,b} = aX + bX^2$,
then extending to all square-integrable functions of $X$.

It is one of the clearest examples on prelims where you must recognize a squared-error minimization problem, identify the orthogonality conditions, and then generalize to the fact that
$E[Y\mid X]$ is the best predictor of $Y$ measurable with respect to $X$.

Problem Statement (verbatim)

Let $X,Y$ be random variables with
$E[X^4] + E[Y^2] < \infty.$ Define $X_{a,b} = aX + bX^2,\qquad (a,b)\in\mathbb R^2.$

a. Assume $(a^,b^)$ satisfy
$E[(Y - X_{a^*,b^*})X] = 0, \qquad E[(Y - X_{a^*,b^*})X^2] = 0.$ Compute
$E[(Y - X_{a^*,b^*})(X_{a^*,b^*} - X_{a,b})].$

b. Use part (a) to prove
$\min_{(a,b)\in\mathbb R^2} E[(Y - X_{a,b})^2] = E[(Y - X_{a^*,b^*})^2].$

c. Let
$\mathcal H = \{W\in\sigma(X): E[W^2]<\infty\}.$

(i) Find $W^*\in\mathcal H$ such that
$E[(Y-W^*)^2] = \min_{W\in\mathcal H} E[(Y-W)^2].$

(ii) Determine whether $\text{(1)}\ E[(Y-W^*)^2] \ge E[(Y - X_{a^*,b^*})^2], \qquad\text{or}\qquad \text{(2)}\ E[(Y-W^*)^2] \le E[(Y - X_{a^*,b^*})^2].$

Solution

Part (a): Orthogonality Identity

Claim

$E[(Y - X_{a^*,b^*})(X_{a^*,b^*} - X_{a,b})] = 0.$

Proof

Rewrite the difference: $X_{a^*,b^*} - X_{a,b} = (a^* - a)X + (b^* - b)X^2.$

Now expand: $$ E[(Y - X_{a^,b^})(X_{a^,b^} - X_{a,b})] = (a^* - a)E[(Y - X_{a^,b^})X]

(b^* - b)E[(Y - X_{a^,b^})X^2]. $$

Both expectations are zero by assumption, so the entire expression is zero.

Conclusion

The error of the best quadratic predictor is orthogonal to the space spanned by $X$ and $X^2$.

Key Takeaways — Part (a)

Orthogonality conditions characterize least-squares solutions.
Recognizing
$X_{a^*,b^*} - X_{a,b} \in \mathrm{span}\{X, X^2\}$ is the key step.

Part (b): Minimizing Squared Error Over the Quadratic Family

Claim

$\min_{(a,b)} E[(Y - X_{a,b})^2] = E[(Y - X_{a^*,b^*})^2].$

Proof

Apply the hint: $Y - X_{a,b} = (Y - X_{a^*,b^*}) + (X_{a^*,b^*} - X_{a,b}).$

Square and take expectation: $$ E[(Y - X_{a,b})^2] = E[(Y - X_{a^,b^})^2]

E[(X_{a^,b^} - X_{a,b})^2]
2\,E[(Y - X_{a^,b^})(X_{a^,b^} - X_{a,b})]. $$

By part (a), the cross term is zero.
The second term is non-negative and equals zero only when $a=a^, b=b^$.

Conclusion

The quadratic family achieves its minimum MSE at $(a^,b^)$.

Key Takeaways — Part (b)

This is the finite-dimensional Pythagorean identity in disguise.
Recognizing that the cross-term vanishes due to part (a) is essential.
Any deviation $(a,b)\neq(a^,b^)$ increases the squared error.

Part (c)(i): Minimizing Over All Square-Integrable Functions of $X$

Claim

$W^* = E[Y \mid X].$

Proof

Let $W=f(X)$.
Use the tower property:

\[E[(Y - f(X))^2] = E\big[\,E[(Y - f(X))^2 \mid X]\,\big].\]

Inside the conditional expectation, $X=x$ is fixed:

\[E[(Y - f(x))^2 \mid X=x] = E[Y^2\mid X=x] - 2f(x)E[Y\mid X=x] + f(x)^2.\]

This quadratic in $f(x)$ is minimized when $f(x)=E[Y\mid X=x].$

Thus the minimizing function is $W^* = E[Y\mid X].$

Conclusion

The conditional expectation is the best predictor of $Y$ among all $X$-measurable square-integrable random variables.

Key Takeaways — Part (c)(i)

Conditional expectation is the MSE minimizer over all $W\in\sigma(X)$.
The minimization is pointwise inside the conditional expectation.
troubleshooting:
- Recognizing that this is a square-error minimization problem is the key insight.

Part (c)(ii): Comparing the Two Minimizers

Claim

$E[(Y - W^*)^2] \le E[(Y - X_{a^*,b^*})^2].$

Proof

Since $W^$ minimizes MSE over *all $W \in \mathcal H$, and since
$X_{a^*,b^*}\in\mathcal H,$ we must have

\[E[(Y - W^*)^2] \le E[(Y - X_{a^*,b^*})^2].\]

Thus inequality (2) is correct.

Conclusion

$E[Y\mid X]$ is a better (or equal) predictor than the best quadratic predictor.

Key Takeaways — Part (c)(ii)

Expanding the search space from a 2-dimensional linear family to $\mathcal H$ can only reduce the minimum MSE.
$W^*$ is a global minimizer.
- Once you recognized this, the correct inequality became obvious.

Global Takeaways From This Problem

Orthogonality characterizes least-squares solutions.
If a model has parameters chosen to minimize squared error, residuals are orthogonal to the span of the regressors.
Pythagorean identity for MSE:
$\|Y - X_{a,b}\|^2 = \|Y - X_{a^*,b^*}\|^2 + \|X_{a^*,b^*} - X_{a,b}\|^2.$
Conditional expectation is the projection of $Y$ onto $\sigma(X)$.
Even without Hilbert space language, you can derive this using conditioning and completing the square.
Expanding the model class can only decrease MSE.

Edit this page on GitHub

2022-Q2 — Least Squares Projection, Orthogonality, and Conditional Expectation

Problem Statement (verbatim)

Solution

Part (a): Orthogonality Identity

Claim

Proof

Conclusion

Key Takeaways — Part (a)

Part (b): Minimizing Squared Error Over the Quadratic Family

Claim

Proof

Conclusion

Key Takeaways — Part (b)

Part (c)(i): Minimizing Over All Square-Integrable Functions of $X$

Claim

Proof

Conclusion

Key Takeaways — Part (c)(i)

Part (c)(ii): Comparing the Two Minimizers

Claim

Proof

Conclusion

Key Takeaways — Part (c)(ii)

Global Takeaways From This Problem

Comments