2022-Q2 — Least Squares Projection, Orthogonality, and Conditional Expectation

2022 Probability Prelim Exam (PDF)

This problem tests the geometric structure underlying least squares estimation, beginning with a finite-dimensional model
$X_{a,b} = aX + bX^2$,
then extending to all square-integrable functions of $X$.

It is one of the clearest examples on prelims where you must recognize a squared-error minimization problem, identify the orthogonality conditions, and then generalize to the fact that
\(E[Y\mid X]\) is the best predictor of $Y$ measurable with respect to $X$.


Problem Statement (verbatim)

Let $X,Y$ be random variables with
\(E[X^4] + E[Y^2] < \infty.\) Define \(X_{a,b} = aX + bX^2,\qquad (a,b)\in\mathbb R^2.\)

a. Assume $(a^,b^)$ satisfy
\(E[(Y - X_{a^*,b^*})X] = 0, \qquad E[(Y - X_{a^*,b^*})X^2] = 0.\) Compute
\(E[(Y - X_{a^*,b^*})(X_{a^*,b^*} - X_{a,b})].\)

b. Use part (a) to prove
\(\min_{(a,b)\in\mathbb R^2} E[(Y - X_{a,b})^2] = E[(Y - X_{a^*,b^*})^2].\)

c. Let
\(\mathcal H = \{W\in\sigma(X): E[W^2]<\infty\}.\)

(i) Find $W^*\in\mathcal H$ such that
\(E[(Y-W^*)^2] = \min_{W\in\mathcal H} E[(Y-W)^2].\)

(ii) Determine whether \(\text{(1)}\ E[(Y-W^*)^2] \ge E[(Y - X_{a^*,b^*})^2], \qquad\text{or}\qquad \text{(2)}\ E[(Y-W^*)^2] \le E[(Y - X_{a^*,b^*})^2].\)


Solution

Part (a): Orthogonality Identity

Claim

\(E[(Y - X_{a^*,b^*})(X_{a^*,b^*} - X_{a,b})] = 0.\)

Proof

Rewrite the difference: \(X_{a^*,b^*} - X_{a,b} = (a^* - a)X + (b^* - b)X^2.\)

Now expand: $$ E[(Y - X_{a^,b^})(X_{a^,b^} - X_{a,b})] = (a^* - a)E[(Y - X_{a^,b^})X]

  • (b^* - b)E[(Y - X_{a^,b^})X^2]. $$

Both expectations are zero by assumption, so the entire expression is zero.

Conclusion

The error of the best quadratic predictor is orthogonal to the space spanned by $X$ and $X^2$.

Key Takeaways — Part (a)

  • Orthogonality conditions characterize least-squares solutions.
  • Recognizing
    \(X_{a^*,b^*} - X_{a,b} \in \mathrm{span}\{X, X^2\}\) is the key step.

Part (b): Minimizing Squared Error Over the Quadratic Family

Claim

\(\min_{(a,b)} E[(Y - X_{a,b})^2] = E[(Y - X_{a^*,b^*})^2].\)

Proof

Apply the hint: \(Y - X_{a,b} = (Y - X_{a^*,b^*}) + (X_{a^*,b^*} - X_{a,b}).\)

Square and take expectation: $$ E[(Y - X_{a,b})^2] = E[(Y - X_{a^,b^})^2]

  • E[(X_{a^,b^} - X_{a,b})^2]
  • 2\,E[(Y - X_{a^,b^})(X_{a^,b^} - X_{a,b})]. $$

By part (a), the cross term is zero.
The second term is non-negative and equals zero only when $a=a^, b=b^$.

Conclusion

The quadratic family achieves its minimum MSE at $(a^,b^)$.

Key Takeaways — Part (b)

  • This is the finite-dimensional Pythagorean identity in disguise.
  • Recognizing that the cross-term vanishes due to part (a) is essential.
  • Any deviation $(a,b)\neq(a^,b^)$ increases the squared error.

Part (c)(i): Minimizing Over All Square-Integrable Functions of $X$

Claim

\(W^* = E[Y \mid X].\)

Proof

Let $W=f(X)$.
Use the tower property:

\[E[(Y - f(X))^2] = E\big[\,E[(Y - f(X))^2 \mid X]\,\big].\]

Inside the conditional expectation, $X=x$ is fixed:

\[E[(Y - f(x))^2 \mid X=x] = E[Y^2\mid X=x] - 2f(x)E[Y\mid X=x] + f(x)^2.\]

This quadratic in $f(x)$ is minimized when \(f(x)=E[Y\mid X=x].\)

Thus the minimizing function is \(W^* = E[Y\mid X].\)

Conclusion

The conditional expectation is the best predictor of $Y$ among all $X$-measurable square-integrable random variables.


Key Takeaways — Part (c)(i)

  • Conditional expectation is the MSE minimizer over all $W\in\sigma(X)$.
  • The minimization is pointwise inside the conditional expectation.
  • troubleshooting:
    • Recognizing that this is a square-error minimization problem is the key insight.


Part (c)(ii): Comparing the Two Minimizers

Claim

\(E[(Y - W^*)^2] \le E[(Y - X_{a^*,b^*})^2].\)

Proof

Since $W^$ minimizes MSE over *all $W \in \mathcal H$, and since
\(X_{a^*,b^*}\in\mathcal H,\) we must have

\[E[(Y - W^*)^2] \le E[(Y - X_{a^*,b^*})^2].\]

Thus inequality (2) is correct.

Conclusion

$E[Y\mid X]$ is a better (or equal) predictor than the best quadratic predictor.


Key Takeaways — Part (c)(ii)

  • Expanding the search space from a 2-dimensional linear family to $\mathcal H$ can only reduce the minimum MSE.
  • $W^*$ is a global minimizer.
    • Once you recognized this, the correct inequality became obvious.

Global Takeaways From This Problem

  1. Orthogonality characterizes least-squares solutions.
    If a model has parameters chosen to minimize squared error, residuals are orthogonal to the span of the regressors.

  2. Pythagorean identity for MSE:
    \(\|Y - X_{a,b}\|^2 = \|Y - X_{a^*,b^*}\|^2 + \|X_{a^*,b^*} - X_{a,b}\|^2.\)

  3. Conditional expectation is the projection of $Y$ onto $\sigma(X)$.
    Even without Hilbert space language, you can derive this using conditioning and completing the square.

  4. Expanding the model class can only decrease MSE.

Comments