2024.Q5 – Wald’s Identity, Stopping Times, and Independence

2024 Probability Prelim Exam (PDF)

Problem Statement (verbatim-ish)

Let $(\mathcal{F}n){n\ge 0}$ be a sequence of $\sigma$-algebras such that
$\mathcal{F}_0 \subset \mathcal{F}_1 \subset \cdots.$

Let $T:\Omega\to{0,1,2,\dots,\infty}$, $T<\infty$ a.s., be a stopping time with respect to $(\mathcal{F}_n)$.

Let $(X_n)_{n\ge 1}$ be a sequence of random variables such that

$X_n$ is $\mathcal{F}_n$-measurable,
$E X_n <\infty$ and $E[X_n]=\mu$ for all $n$,
the sequence $(X_n)_{n\ge 1}$ and the stopping time $T$ are independent.

Define the partial sums $S_n = \sum_{k=1}^n X_k, \quad n\ge 1,\qquad S_0 = 0.$

(a)

Let $n\ge 0$. Prove:

(i)
$S_{(n+1)\wedge T} - S_{n\wedge T} = X_{n+1}\,\mathbf{1}_{\{T>n\}}.$

(ii)
$S_T = \sum_{n=0}^{\infty} \big( S_{(n+1)\wedge T} - S_{n\wedge T}\big) = \sum_{n=0}^{\infty} X_{n+1}\,\mathbf{1}_{\{T>n\}} = \sum_{k=1}^{\infty} X_{k}\,\mathbf{1}_{\{T\ge k\}}.$

(b) Assume now that $E[T]<\infty$.

(i) Show (using part (a)) that
$E[S_T] = \mu\,E[T].$ (Here the two sides may be finite or $+\infty$.)
Hint: start from $E[S_T] = \sum_{k=1}^{\infty} E\big( X_k \,\mathbf{1}_{\{T\ge k\}}\big).$

(ii) As an example, let $X_k$ be i.i.d. Bernoulli$(p)$, $P(X_k=1)=p$, $P(X_k=0)=1-p$, and let
$T = \inf\left\{n\ge 1: \sum_{k=1}^n X_k = 2\right\},$ the number of trials needed to get 2 successes.

Compute $E[T]$. What is the distribution of $T$?

(c) We now drop the assumption $E[T]<\infty$. Instead assume that $E|X_1|<\infty$ and that

$\frac{S_n}{n} \xrightarrow{a.s.} \mu.$

(i) For each integer $m\ge 1$, define the truncated stopping time $T^{(m)} := T\wedge m.$ Show that $E[S_{T^{(m)}}] = \mu\,E[T^{(m)}].$

(ii) Use (i) to prove that
$E[S_T] = \mu\,E[T]$ still holds (under appropriate finiteness assumptions). Hint: show first that $T^{(m)}\uparrow T$, then use part (a)(ii) for the bounded stopping time $T^{(m)}$.

Part (a)

(a)(i)

Claim.

For every $n\ge 0$, $S_{(n+1)\wedge T} - S_{n\wedge T} = X_{n+1}\,\mathbf{1}_{\{T>n\}}\quad\text{a.s.}$

Proof.

This is exactly the “marginal difference” of the sum between steps $n$ and $n+1$, under a stopping time.

Recall $S_n = \sum_{k=1}^n X_k.$

We split into two cases.

Case 1: $T \le n$.
Then $(n+1)\wedge T = T$ and $n\wedge T = T$, so $S_{(n+1)\wedge T} - S_{n\wedge T} = S_T - S_T = 0.$ Also $\mathbf{1}_{{T>n}} = 0$, so the right-hand side is $X_{n+1}\,\mathbf{1}_{\{T>n\}} = 0.$
Case 2: $T > n$.
Then $n\wedge T = n$ and $(n+1)\wedge T = n+1$, hence $S_{(n+1)\wedge T} - S_{n\wedge T} = S_{n+1} - S_n = X_{n+1}.$ In this case $\mathbf{1}{{T>n}}=1$, so RHS is $X{n+1}$.

In both cases, the identity holds. We can encode the two cases with indicators: $S_{(n+1)\wedge T}-S_{n\wedge T} = X_{n+1}\,\mathbf{1}_{\{T>n\}}.$

Conclusion.

The increment of the stopped sum from step $n$ to $n+1$ is exactly the next increment $X_{n+1}$, times the indicator that we have not yet stopped at time $n$.

Key Takeaways

Think of this as the incremental step of a summation under a stopping rule.
Stopping time logic: either we are already stopped (increment 0) or not (increment is $X_{n+1}$).
Indicators are a clean way to encode case-by-case reasoning.

(a)(ii)

Claim.

We have the telescoping representation $S_T = \sum_{n=0}^\infty \big(S_{(n+1)\wedge T} - S_{n\wedge T}\big) = \sum_{n=0}^\infty X_{n+1}\,\mathbf{1}_{\{T>n\}} = \sum_{k=1}^\infty X_k\,\mathbf{1}_{\{T\ge k\}}.$

Proof.

First write the telescoping sum: $\sum_{n=0}^{N-1} \big(S_{(n+1)\wedge T} - S_{n\wedge T}\big) = S_{N\wedge T} - S_{0\wedge T} = S_{N\wedge T}.$

Now let $N\to\infty$. Since $T<\infty$ a.s., we have $N\wedge T \uparrow T$ as $N\to\infty$, hence $S_{N\wedge T} \to S_T\quad\text{a.s.}$ On the other hand, using (a)(i), $S_{(n+1)\wedge T} - S_{n\wedge T} = X_{n+1}\,\mathbf{1}_{\{T>n\}}.$

So, $S_T = \sum_{n=0}^\infty X_{n+1}\,\mathbf{1}_{\{T>n\}}.$

Reindex with $k = n+1$: $S_T = \sum_{k=1}^\infty X_k\,\mathbf{1}_{\{T\ge k\}}.$ (Notice ${T>n} = {T\ge n+1}$.)

This is the classical telescoping sum picture: writing out the first several terms for large and small $T$, all the middle differences cancel and only the “block” up to $T$ remains.

Conclusion.

The sum up to the stopping time $T$ is the infinite sum of “incremental steps” of the stopped process, which is also the sum of $X_k$ multiplied by the indicator that we have not yet stopped before time $k$.

Key Takeaways

Telescoping sums under stopping times give very clean representations like $S_T = \sum_k X_k 1_{\{T\ge k\}}.$
This representation is the starting point for Wald’s identity: expectations become sums of expectations of indicators.
Translating “what happens between two consecutive sums” into indicator language is extremely powerful.

Part (b)

(b)(i)

Claim.

Assume $E[T]<\infty$ and $E[X_k]=\mu$ for all $k$. Then $E[S_T] = \mu\,E[T].$

(This may be finite or $+\infty$; both sides match.)

Proof.

Start from the representation in (a)(ii): $S_T = \sum_{k=1}^\infty X_k \mathbf{1}_{\{T\ge k\}}.$

Since $T<\infty$ a.s., only finitely many indicators are nonzero a.s., so the sum is a.s. finite. Using Fubini/monotone convergence for nonnegative terms (or dominated convergence under integrability), we can interchange sum and expectation: $E[S_T] = E\left[\sum_{k=1}^\infty X_k \mathbf{1}_{\{T\ge k\}}\right] = \sum_{k=1}^\infty E[X_k \mathbf{1}_{\{T\ge k\}}].$

Because $(X_k)$ and $T$ are independent, $E[X_k \mathbf{1}_{\{T\ge k\}}] = E[X_k] \, P(T\ge k) = \mu\,P(T\ge k).$

Thus $E[S_T] = \sum_{k=1}^\infty \mu P(T\ge k) = \mu \sum_{k=1}^\infty P(T\ge k).$

But for a nonnegative integer-valued random variable $T$, $E[T] = \sum_{k=1}^\infty P(T\ge k).$

Therefore $E[S_T] = \mu\,E[T].$

Conclusion.

Under $E[T]<\infty$ and independence, the expected sum up to a stopping time is the mean times the expected stopping time: this is Wald’s identity.

Key Takeaways

The key formula is $E[T] = \sum_{k\ge1} P(T\ge k),$ which is just expectation via tail sums for nonnegative integer-valued $T$.
Independence lets you split $E[X_k\,1_{\{T\ge k\}}] = E[X_k]P(T\ge k).$
Wald’s identity is nothing but linearity of expectation plus the tail-sum formula for $T$.

(b)(ii) Example with Bernoulli and Negative Binomial

Claim.

Let $(X_k)$ be i.i.d. Bernoulli$(p)$, and define $T = \inf\Big\{n\ge 1: \sum_{k=1}^n X_k = 2\Big\}.$ Then:

$T$ has a Negative Binomial distribution with parameters $(r=2,p)$,
$E[T] = \dfrac{2}{p}$.

Proof.

Distribution of $T$.
To have $T=n$, we need:

exactly 1 success in the first $n-1$ trials, and
a success on trial $n$.

Thus $P(T=n) = \binom{n-1}{1} p^2 (1-p)^{n-2},\quad n=2,3,\dots$ which is the pmf of a Negative Binomial$(r=2,p)$: number of trials to obtain 2 successes.

Expectation.
For a Negative Binomial$(r,p)$, $E[T] = \frac{r}{p}.$ So here $E[T] = \frac{2}{p}.$

Alternatively, use Wald’s identity:

$S_T = \sum_{k=1}^T X_k = 2$ by definition of $T$.
$E[S_T] = E[2] = 2$.

But also from (b)(i), $E[S_T] = \mu E[T], \quad \mu = E[X_1] = p.$ So $2 = p\,E[T] \quad\Rightarrow\quad E[T] = \frac{2}{p}.$

Conclusion.

The stopping time “number of trials to get 2 successes” is Negative Binomial, with mean $2/p$. Wald’s identity gives a quick way to compute this mean without summing the series explicitly.

Key Takeaways

“Number of trials to get $r$ successes” $\Rightarrow$ Negative Binomial distribution.
Wald’s identity can be used in reverse: if you know the sum at the stopping time, you can solve for $E[T]$.
This is a classic “counting trials until success” example, useful to keep in mind for prelims.

Part (c)

Now we drop the explicit assumption $E[T]<\infty$. Instead, we assume:

$E X_1 <\infty$,
$\frac{S_n}{n} \to \mu$ a.s. (so $S_n$ behaves like $\mu n$ for large $n$).

The idea is to approximate $T$ by bounded stopping times $T^{(m)} = T\wedge m$, prove Wald’s identity for each $T^{(m)}$, then pass to the limit.

(c)(i)

Claim.

For each $m\ge 1$, $E[S_{T^{(m)}}] = \mu\,E[T^{(m)}].$

Proof.

For fixed $m$, $T^{(m)} = T\wedge m$ is bounded by $m$, so in particular $E[T^{(m)}] \le m < \infty.$ We can therefore apply part (b)(i) to the stopping time $T^{(m)}$ instead of $T$.

Using the representation from (a)(ii) with $T^{(m)}$, $S_{T^{(m)}} = \sum_{k=1}^\infty X_k\,\mathbf{1}_{\{T^{(m)}\ge k\}} = \sum_{k=1}^m X_k\,\mathbf{1}_{\{T\ge k\}},$ since $T^{(m)}\ge k$ iff both $k\le m$ and $T\ge k$.

Repeating the same argument as in (b)(i), $E[S_{T^{(m)}}] = \sum_{k=1}^m E[X_k 1_{\{T\ge k\}}] = \sum_{k=1}^m E[X_k] P(T\ge k) = \mu \sum_{k=1}^m P(T\ge k) = \mu E[T^{(m)}].$

Conclusion.

Even when we are not sure about $E[T]$, Wald’s identity holds for every bounded truncation $T^{(m)}$.

Key Takeaways

Truncation is a standard trick: replace $T$ by $T^{(m)}$ to get finiteness automatically.
All previous arguments for bounded stopping times go through unchanged.
This sets up a limit argument to recover results for the original $T$.

(c)(ii)

Claim.

Under the given assumptions, one can still conclude $E[S_T] = \mu\,E[T]$ whenever both sides are finite (or both are $+\infty$).

Proof.

From (c)(i), for each $m$, $E[S_{T^{(m)}}] = \mu E[T^{(m)}].$

Note that $T^{(m)} \uparrow T$ as $m\to\infty$. Also $S_{T^{(m)}} = \sum_{k=1}^{T^{(m)}} X_k \uparrow \sum_{k=1}^T X_k = S_T$ pointwise on the event ${T<\infty}$.

The precise justification of passing the limit inside $E[\cdot]$ depends on finiteness assumptions:

if $E[T]<\infty$ and $E

X_1

<\infty$, then one can dominate $

S_{T^{(m)}}

$ by something integrable (e.g. use the growth of $S_n/n\to\mu$),

or use monotone convergence in the nonnegative case.

Assuming the necessary integrability so that we are allowed to interchange limit and expectation, we get: $\lim_{m\to\infty} E[S_{T^{(m)}}] = E[S_T], \quad \lim_{m\to\infty} E[T^{(m)}] = E[T].$

Taking limits in $E[S_{T^{(m)}}] = \mu E[T^{(m)}]$ yields $E[S_T] = \mu E[T].$

Conclusion.

By truncating the stopping time, applying Wald’s identity to each truncation, and carefully passing to the limit, we recover Wald’s identity for the original stopping time under weaker assumptions.

Key Takeaways

When full conditions are unclear, truncate first, prove results for the truncated object, then take limits.
Bounded stopping times are much easier to handle.
This is a standard pattern:
- Use $T^{(m)} = T\wedge m$.
- Prove statement for each $T^{(m)}$.
- Pass to the limit $m\to\infty$ using monotone/dominated convergence.

Global Key Takeaways for Question 5

This entire problem is a structured derivation of Wald’s identity: $E\Big[\sum_{k=1}^T X_k\Big] = E[X_1] E[T],$ under appropriate assumptions.
Core tools:
- Stopping-time representation:
  $S_T = \sum_k X_k 1_{{T\ge k}}$.
- Tail-sum formula for nonnegative integer-valued random variables: $E[T] = \sum_k P(T\ge k)$.
- Independence of $(X_k)$ and $T$ to factor expectations.
- Borel–Cantelli / truncations if you need to control bad events or unbounded stopping times.
Philosophically:
“Sum up to a random time = sum over all times, weighted by whether we have not stopped yet.”

This is a very high-yield pattern for prelims: if you see a sum up to a stopping time and independence, think Wald’s identity and the indicator decomposition from part (a).

Edit this page on GitHub

2024.Q5 – Wald’s Identity, Stopping Times, and Independence

Problem Statement (verbatim-ish)

(a)

(b) Assume now that $E[T]<\infty$.

(c) We now drop the assumption $E[T]<\infty$. Instead assume that $E|X_1|<\infty$ and that

Part (a)

(a)(i)

Claim.

Proof.

Conclusion.

Key Takeaways

(a)(ii)

Claim.

Proof.

Conclusion.

Key Takeaways

Part (b)

(b)(i)

Claim.

Proof.

Conclusion.

Key Takeaways

(b)(ii) Example with Bernoulli and Negative Binomial

Claim.

Proof.

Conclusion.

Key Takeaways

Part (c)

(c)(i)

Claim.

Proof.

Conclusion.

Key Takeaways

(c)(ii)

Claim.

Proof.

Conclusion.

Key Takeaways

Global Key Takeaways for Question 5

Comments