2024.Q5 – Wald’s Identity, Stopping Times, and Independence
2024 Probability Prelim Exam (PDF)
Problem Statement (verbatim-ish)
Let $(\mathcal{F}n){n\ge 0}$ be a sequence of $\sigma$-algebras such that
\(\mathcal{F}_0 \subset \mathcal{F}_1 \subset \cdots.\)
Let $T:\Omega\to{0,1,2,\dots,\infty}$, $T<\infty$ a.s., be a stopping time with respect to $(\mathcal{F}_n)$.
Let $(X_n)_{n\ge 1}$ be a sequence of random variables such that
- $X_n$ is $\mathcal{F}_n$-measurable,
-
$E X_n <\infty$ and $E[X_n]=\mu$ for all $n$, - the sequence $(X_n)_{n\ge 1}$ and the stopping time $T$ are independent.
Define the partial sums \(S_n = \sum_{k=1}^n X_k, \quad n\ge 1,\qquad S_0 = 0.\)
(a)
Let $n\ge 0$. Prove:
(i)
\(S_{(n+1)\wedge T} - S_{n\wedge T}
= X_{n+1}\,\mathbf{1}_{\{T>n\}}.\)
(ii)
\(S_T = \sum_{n=0}^{\infty}
\big( S_{(n+1)\wedge T} - S_{n\wedge T}\big)
= \sum_{n=0}^{\infty} X_{n+1}\,\mathbf{1}_{\{T>n\}}
= \sum_{k=1}^{\infty} X_{k}\,\mathbf{1}_{\{T\ge k\}}.\)
(b) Assume now that $E[T]<\infty$.
(i) Show (using part (a)) that
\(E[S_T] = \mu\,E[T].\)
(Here the two sides may be finite or $+\infty$.)
Hint: start from
\(E[S_T] = \sum_{k=1}^{\infty} E\big( X_k \,\mathbf{1}_{\{T\ge k\}}\big).\)
(ii) As an example, let $X_k$ be i.i.d. Bernoulli$(p)$, $P(X_k=1)=p$, $P(X_k=0)=1-p$, and let
\(T = \inf\left\{n\ge 1: \sum_{k=1}^n X_k = 2\right\},\)
the number of trials needed to get 2 successes.
Compute $E[T]$. What is the distribution of $T$?
(c) We now drop the assumption $E[T]<\infty$. Instead assume that $E|X_1|<\infty$ and that
\(\frac{S_n}{n} \xrightarrow{a.s.} \mu.\)
(i) For each integer $m\ge 1$, define the truncated stopping time \(T^{(m)} := T\wedge m.\) Show that \(E[S_{T^{(m)}}] = \mu\,E[T^{(m)}].\)
(ii) Use (i) to prove that
\(E[S_T] = \mu\,E[T]\)
still holds (under appropriate finiteness assumptions). Hint: show first that $T^{(m)}\uparrow T$, then use part (a)(ii) for the bounded stopping time $T^{(m)}$.
Part (a)
(a)(i)
Claim.
For every $n\ge 0$, \(S_{(n+1)\wedge T} - S_{n\wedge T} = X_{n+1}\,\mathbf{1}_{\{T>n\}}\quad\text{a.s.}\)
Proof.
This is exactly the “marginal difference” of the sum between steps $n$ and $n+1$, under a stopping time.
Recall \(S_n = \sum_{k=1}^n X_k.\)
We split into two cases.
-
Case 1: $T \le n$.
Then $(n+1)\wedge T = T$ and $n\wedge T = T$, so \(S_{(n+1)\wedge T} - S_{n\wedge T} = S_T - S_T = 0.\) Also $\mathbf{1}_{{T>n}} = 0$, so the right-hand side is \(X_{n+1}\,\mathbf{1}_{\{T>n\}} = 0.\) -
Case 2: $T > n$.
Then $n\wedge T = n$ and $(n+1)\wedge T = n+1$, hence \(S_{(n+1)\wedge T} - S_{n\wedge T} = S_{n+1} - S_n = X_{n+1}.\) In this case $\mathbf{1}{{T>n}}=1$, so RHS is $X{n+1}$.
In both cases, the identity holds. We can encode the two cases with indicators: \(S_{(n+1)\wedge T}-S_{n\wedge T} = X_{n+1}\,\mathbf{1}_{\{T>n\}}.\)
Conclusion.
The increment of the stopped sum from step $n$ to $n+1$ is exactly the next increment $X_{n+1}$, times the indicator that we have not yet stopped at time $n$.
Key Takeaways
- Think of this as the incremental step of a summation under a stopping rule.
- Stopping time logic: either we are already stopped (increment 0) or not (increment is $X_{n+1}$).
- Indicators are a clean way to encode case-by-case reasoning.
(a)(ii)
Claim.
We have the telescoping representation \(S_T = \sum_{n=0}^\infty \big(S_{(n+1)\wedge T} - S_{n\wedge T}\big) = \sum_{n=0}^\infty X_{n+1}\,\mathbf{1}_{\{T>n\}} = \sum_{k=1}^\infty X_k\,\mathbf{1}_{\{T\ge k\}}.\)
Proof.
First write the telescoping sum: \(\sum_{n=0}^{N-1} \big(S_{(n+1)\wedge T} - S_{n\wedge T}\big) = S_{N\wedge T} - S_{0\wedge T} = S_{N\wedge T}.\)
Now let $N\to\infty$. Since $T<\infty$ a.s., we have $N\wedge T \uparrow T$ as $N\to\infty$, hence \(S_{N\wedge T} \to S_T\quad\text{a.s.}\) On the other hand, using (a)(i), \(S_{(n+1)\wedge T} - S_{n\wedge T} = X_{n+1}\,\mathbf{1}_{\{T>n\}}.\)
So, \(S_T = \sum_{n=0}^\infty X_{n+1}\,\mathbf{1}_{\{T>n\}}.\)
Reindex with $k = n+1$: \(S_T = \sum_{k=1}^\infty X_k\,\mathbf{1}_{\{T\ge k\}}.\) (Notice ${T>n} = {T\ge n+1}$.)
This is the classical telescoping sum picture: writing out the first several terms for large and small $T$, all the middle differences cancel and only the “block” up to $T$ remains.
Conclusion.
The sum up to the stopping time $T$ is the infinite sum of “incremental steps” of the stopped process, which is also the sum of $X_k$ multiplied by the indicator that we have not yet stopped before time $k$.
Key Takeaways
- Telescoping sums under stopping times give very clean representations like \(S_T = \sum_k X_k 1_{\{T\ge k\}}.\)
- This representation is the starting point for Wald’s identity: expectations become sums of expectations of indicators.
- Translating “what happens between two consecutive sums” into indicator language is extremely powerful.
Part (b)
(b)(i)
Claim.
Assume $E[T]<\infty$ and $E[X_k]=\mu$ for all $k$. Then \(E[S_T] = \mu\,E[T].\)
(This may be finite or $+\infty$; both sides match.)
Proof.
Start from the representation in (a)(ii): \(S_T = \sum_{k=1}^\infty X_k \mathbf{1}_{\{T\ge k\}}.\)
Since $T<\infty$ a.s., only finitely many indicators are nonzero a.s., so the sum is a.s. finite. Using Fubini/monotone convergence for nonnegative terms (or dominated convergence under integrability), we can interchange sum and expectation: \(E[S_T] = E\left[\sum_{k=1}^\infty X_k \mathbf{1}_{\{T\ge k\}}\right] = \sum_{k=1}^\infty E[X_k \mathbf{1}_{\{T\ge k\}}].\)
Because $(X_k)$ and $T$ are independent, \(E[X_k \mathbf{1}_{\{T\ge k\}}] = E[X_k] \, P(T\ge k) = \mu\,P(T\ge k).\)
Thus \(E[S_T] = \sum_{k=1}^\infty \mu P(T\ge k) = \mu \sum_{k=1}^\infty P(T\ge k).\)
But for a nonnegative integer-valued random variable $T$, \(E[T] = \sum_{k=1}^\infty P(T\ge k).\)
Therefore \(E[S_T] = \mu\,E[T].\)
Conclusion.
Under $E[T]<\infty$ and independence, the expected sum up to a stopping time is the mean times the expected stopping time: this is Wald’s identity.
Key Takeaways
- The key formula is \(E[T] = \sum_{k\ge1} P(T\ge k),\) which is just expectation via tail sums for nonnegative integer-valued $T$.
- Independence lets you split \(E[X_k\,1_{\{T\ge k\}}] = E[X_k]P(T\ge k).\)
- Wald’s identity is nothing but linearity of expectation plus the tail-sum formula for $T$.
(b)(ii) Example with Bernoulli and Negative Binomial
Claim.
Let $(X_k)$ be i.i.d. Bernoulli$(p)$, and define \(T = \inf\Big\{n\ge 1: \sum_{k=1}^n X_k = 2\Big\}.\) Then:
- $T$ has a Negative Binomial distribution with parameters $(r=2,p)$,
- $E[T] = \dfrac{2}{p}$.
Proof.
Distribution of $T$.
To have $T=n$, we need:
- exactly 1 success in the first $n-1$ trials, and
- a success on trial $n$.
Thus \(P(T=n) = \binom{n-1}{1} p^2 (1-p)^{n-2},\quad n=2,3,\dots\) which is the pmf of a Negative Binomial$(r=2,p)$: number of trials to obtain 2 successes.
Expectation.
For a Negative Binomial$(r,p)$,
\(E[T] = \frac{r}{p}.\)
So here
\(E[T] = \frac{2}{p}.\)
Alternatively, use Wald’s identity:
- $S_T = \sum_{k=1}^T X_k = 2$ by definition of $T$.
- $E[S_T] = E[2] = 2$.
But also from (b)(i), \(E[S_T] = \mu E[T], \quad \mu = E[X_1] = p.\) So \(2 = p\,E[T] \quad\Rightarrow\quad E[T] = \frac{2}{p}.\)
Conclusion.
The stopping time “number of trials to get 2 successes” is Negative Binomial, with mean $2/p$. Wald’s identity gives a quick way to compute this mean without summing the series explicitly.
Key Takeaways
- “Number of trials to get $r$ successes” $\Rightarrow$ Negative Binomial distribution.
- Wald’s identity can be used in reverse: if you know the sum at the stopping time, you can solve for $E[T]$.
- This is a classic “counting trials until success” example, useful to keep in mind for prelims.
Part (c)
Now we drop the explicit assumption $E[T]<\infty$. Instead, we assume:
-
$E X_1 <\infty$, - $\frac{S_n}{n} \to \mu$ a.s. (so $S_n$ behaves like $\mu n$ for large $n$).
The idea is to approximate $T$ by bounded stopping times $T^{(m)} = T\wedge m$, prove Wald’s identity for each $T^{(m)}$, then pass to the limit.
(c)(i)
Claim.
For each $m\ge 1$, \(E[S_{T^{(m)}}] = \mu\,E[T^{(m)}].\)
Proof.
For fixed $m$, $T^{(m)} = T\wedge m$ is bounded by $m$, so in particular \(E[T^{(m)}] \le m < \infty.\) We can therefore apply part (b)(i) to the stopping time $T^{(m)}$ instead of $T$.
Using the representation from (a)(ii) with $T^{(m)}$, \(S_{T^{(m)}} = \sum_{k=1}^\infty X_k\,\mathbf{1}_{\{T^{(m)}\ge k\}} = \sum_{k=1}^m X_k\,\mathbf{1}_{\{T\ge k\}},\) since $T^{(m)}\ge k$ iff both $k\le m$ and $T\ge k$.
Repeating the same argument as in (b)(i), \(E[S_{T^{(m)}}] = \sum_{k=1}^m E[X_k 1_{\{T\ge k\}}] = \sum_{k=1}^m E[X_k] P(T\ge k) = \mu \sum_{k=1}^m P(T\ge k) = \mu E[T^{(m)}].\)
Conclusion.
Even when we are not sure about $E[T]$, Wald’s identity holds for every bounded truncation $T^{(m)}$.
Key Takeaways
- Truncation is a standard trick: replace $T$ by $T^{(m)}$ to get finiteness automatically.
- All previous arguments for bounded stopping times go through unchanged.
- This sets up a limit argument to recover results for the original $T$.
(c)(ii)
Claim.
Under the given assumptions, one can still conclude \(E[S_T] = \mu\,E[T]\) whenever both sides are finite (or both are $+\infty$).
Proof.
From (c)(i), for each $m$, \(E[S_{T^{(m)}}] = \mu E[T^{(m)}].\)
Note that $T^{(m)} \uparrow T$ as $m\to\infty$. Also \(S_{T^{(m)}} = \sum_{k=1}^{T^{(m)}} X_k \uparrow \sum_{k=1}^T X_k = S_T\) pointwise on the event ${T<\infty}$.
The precise justification of passing the limit inside $E[\cdot]$ depends on finiteness assumptions:
-
if $E[T]<\infty$ and $E X_1 <\infty$, then one can dominate $ S_{T^{(m)}} $ by something integrable (e.g. use the growth of $S_n/n\to\mu$), - or use monotone convergence in the nonnegative case.
Assuming the necessary integrability so that we are allowed to interchange limit and expectation, we get: \(\lim_{m\to\infty} E[S_{T^{(m)}}] = E[S_T], \quad \lim_{m\to\infty} E[T^{(m)}] = E[T].\)
Taking limits in \(E[S_{T^{(m)}}] = \mu E[T^{(m)}]\) yields \(E[S_T] = \mu E[T].\)
Conclusion.
By truncating the stopping time, applying Wald’s identity to each truncation, and carefully passing to the limit, we recover Wald’s identity for the original stopping time under weaker assumptions.
Key Takeaways
- When full conditions are unclear, truncate first, prove results for the truncated object, then take limits.
- Bounded stopping times are much easier to handle.
- This is a standard pattern:
- Use $T^{(m)} = T\wedge m$.
- Prove statement for each $T^{(m)}$.
- Pass to the limit $m\to\infty$ using monotone/dominated convergence.
Global Key Takeaways for Question 5
- This entire problem is a structured derivation of Wald’s identity: \(E\Big[\sum_{k=1}^T X_k\Big] = E[X_1] E[T],\) under appropriate assumptions.
- Core tools:
- Stopping-time representation:
$S_T = \sum_k X_k 1_{{T\ge k}}$. - Tail-sum formula for nonnegative integer-valued random variables: $E[T] = \sum_k P(T\ge k)$.
- Independence of $(X_k)$ and $T$ to factor expectations.
- Borel–Cantelli / truncations if you need to control bad events or unbounded stopping times.
- Stopping-time representation:
- Philosophically:
“Sum up to a random time = sum over all times, weighted by whether we have not stopped yet.”
This is a very high-yield pattern for prelims: if you see a sum up to a stopping time and independence, think Wald’s identity and the indicator decomposition from part (a).
Comments