2022-Q2 — Least Squares Projection, Orthogonality, and Conditional Expectation
2022 Probability Prelim Exam (PDF)
This problem tests the geometric structure underlying least squares estimation, beginning with a finite-dimensional model
$X_{a,b} = aX + bX^2$,
then extending to all square-integrable functions of $X$.
It is one of the clearest examples on prelims where you must recognize a squared-error minimization problem, identify the orthogonality conditions, and then generalize to the fact that
\(E[Y\mid X]\)
is the best predictor of $Y$ measurable with respect to $X$.
Problem Statement (verbatim)
Let $X,Y$ be random variables with
\(E[X^4] + E[Y^2] < \infty.\) Define \(X_{a,b} = aX + bX^2,\qquad (a,b)\in\mathbb R^2.\)a. Assume $(a^,b^)$ satisfy
\(E[(Y - X_{a^*,b^*})X] = 0, \qquad E[(Y - X_{a^*,b^*})X^2] = 0.\) Compute
\(E[(Y - X_{a^*,b^*})(X_{a^*,b^*} - X_{a,b})].\)b. Use part (a) to prove
\(\min_{(a,b)\in\mathbb R^2} E[(Y - X_{a,b})^2] = E[(Y - X_{a^*,b^*})^2].\)c. Let
\(\mathcal H = \{W\in\sigma(X): E[W^2]<\infty\}.\)(i) Find $W^*\in\mathcal H$ such that
\(E[(Y-W^*)^2] = \min_{W\in\mathcal H} E[(Y-W)^2].\)(ii) Determine whether \(\text{(1)}\ E[(Y-W^*)^2] \ge E[(Y - X_{a^*,b^*})^2], \qquad\text{or}\qquad \text{(2)}\ E[(Y-W^*)^2] \le E[(Y - X_{a^*,b^*})^2].\)
Solution
Part (a): Orthogonality Identity
Claim
\(E[(Y - X_{a^*,b^*})(X_{a^*,b^*} - X_{a,b})] = 0.\)
Proof
Rewrite the difference: \(X_{a^*,b^*} - X_{a,b} = (a^* - a)X + (b^* - b)X^2.\)
Now expand: $$ E[(Y - X_{a^,b^})(X_{a^,b^} - X_{a,b})] = (a^* - a)E[(Y - X_{a^,b^})X]
- (b^* - b)E[(Y - X_{a^,b^})X^2]. $$
Both expectations are zero by assumption, so the entire expression is zero.
Conclusion
The error of the best quadratic predictor is orthogonal to the space spanned by $X$ and $X^2$.
Key Takeaways — Part (a)
- Orthogonality conditions characterize least-squares solutions.
- Recognizing
\(X_{a^*,b^*} - X_{a,b} \in \mathrm{span}\{X, X^2\}\) is the key step.
Part (b): Minimizing Squared Error Over the Quadratic Family
Claim
\(\min_{(a,b)} E[(Y - X_{a,b})^2] = E[(Y - X_{a^*,b^*})^2].\)
Proof
Apply the hint: \(Y - X_{a,b} = (Y - X_{a^*,b^*}) + (X_{a^*,b^*} - X_{a,b}).\)
Square and take expectation: $$ E[(Y - X_{a,b})^2] = E[(Y - X_{a^,b^})^2]
- E[(X_{a^,b^} - X_{a,b})^2]
- 2\,E[(Y - X_{a^,b^})(X_{a^,b^} - X_{a,b})]. $$
By part (a), the cross term is zero.
The second term is non-negative and equals zero only when $a=a^, b=b^$.
Conclusion
The quadratic family achieves its minimum MSE at $(a^,b^)$.
Key Takeaways — Part (b)
- This is the finite-dimensional Pythagorean identity in disguise.
- Recognizing that the cross-term vanishes due to part (a) is essential.
- Any deviation $(a,b)\neq(a^,b^)$ increases the squared error.
Part (c)(i): Minimizing Over All Square-Integrable Functions of $X$
Claim
\(W^* = E[Y \mid X].\)
Proof
Let $W=f(X)$.
Use the tower property:
Inside the conditional expectation, $X=x$ is fixed:
\[E[(Y - f(x))^2 \mid X=x] = E[Y^2\mid X=x] - 2f(x)E[Y\mid X=x] + f(x)^2.\]This quadratic in $f(x)$ is minimized when \(f(x)=E[Y\mid X=x].\)
Thus the minimizing function is \(W^* = E[Y\mid X].\)
Conclusion
The conditional expectation is the best predictor of $Y$ among all $X$-measurable square-integrable random variables.
Key Takeaways — Part (c)(i)
- Conditional expectation is the MSE minimizer over all $W\in\sigma(X)$.
- The minimization is pointwise inside the conditional expectation.
- troubleshooting:
- Recognizing that this is a square-error minimization problem is the key insight.
Part (c)(ii): Comparing the Two Minimizers
Claim
\(E[(Y - W^*)^2] \le E[(Y - X_{a^*,b^*})^2].\)
Proof
Since $W^$ minimizes MSE over *all $W \in \mathcal H$, and since
\(X_{a^*,b^*}\in\mathcal H,\)
we must have
Thus inequality (2) is correct.
Conclusion
$E[Y\mid X]$ is a better (or equal) predictor than the best quadratic predictor.
Key Takeaways — Part (c)(ii)
- Expanding the search space from a 2-dimensional linear family to $\mathcal H$ can only reduce the minimum MSE.
- $W^*$ is a global minimizer.
- Once you recognized this, the correct inequality became obvious.
Global Takeaways From This Problem
-
Orthogonality characterizes least-squares solutions.
If a model has parameters chosen to minimize squared error, residuals are orthogonal to the span of the regressors. -
Pythagorean identity for MSE:
\(\|Y - X_{a,b}\|^2 = \|Y - X_{a^*,b^*}\|^2 + \|X_{a^*,b^*} - X_{a,b}\|^2.\) -
Conditional expectation is the projection of $Y$ onto $\sigma(X)$.
Even without Hilbert space language, you can derive this using conditioning and completing the square. -
Expanding the model class can only decrease MSE.
Comments