2024.Q5 – Wald’s Identity, Stopping Times, and Independence

2024 Probability Prelim Exam (PDF)

Problem Statement (verbatim-ish)

Let $(\mathcal{F}n){n\ge 0}$ be a sequence of $\sigma$-algebras such that
\(\mathcal{F}_0 \subset \mathcal{F}_1 \subset \cdots.\)

Let $T:\Omega\to{0,1,2,\dots,\infty}$, $T<\infty$ a.s., be a stopping time with respect to $(\mathcal{F}_n)$.

Let $(X_n)_{n\ge 1}$ be a sequence of random variables such that

  • $X_n$ is $\mathcal{F}_n$-measurable,
  • $E X_n <\infty$ and $E[X_n]=\mu$ for all $n$,
  • the sequence $(X_n)_{n\ge 1}$ and the stopping time $T$ are independent.

Define the partial sums \(S_n = \sum_{k=1}^n X_k, \quad n\ge 1,\qquad S_0 = 0.\)


(a)

Let $n\ge 0$. Prove:

(i)
\(S_{(n+1)\wedge T} - S_{n\wedge T} = X_{n+1}\,\mathbf{1}_{\{T>n\}}.\)

(ii)
\(S_T = \sum_{n=0}^{\infty} \big( S_{(n+1)\wedge T} - S_{n\wedge T}\big) = \sum_{n=0}^{\infty} X_{n+1}\,\mathbf{1}_{\{T>n\}} = \sum_{k=1}^{\infty} X_{k}\,\mathbf{1}_{\{T\ge k\}}.\)


(b) Assume now that $E[T]<\infty$.

(i) Show (using part (a)) that
\(E[S_T] = \mu\,E[T].\) (Here the two sides may be finite or $+\infty$.)
Hint: start from \(E[S_T] = \sum_{k=1}^{\infty} E\big( X_k \,\mathbf{1}_{\{T\ge k\}}\big).\)

(ii) As an example, let $X_k$ be i.i.d. Bernoulli$(p)$, $P(X_k=1)=p$, $P(X_k=0)=1-p$, and let
\(T = \inf\left\{n\ge 1: \sum_{k=1}^n X_k = 2\right\},\) the number of trials needed to get 2 successes.

Compute $E[T]$. What is the distribution of $T$?


(c) We now drop the assumption $E[T]<\infty$. Instead assume that $E|X_1|<\infty$ and that

\(\frac{S_n}{n} \xrightarrow{a.s.} \mu.\)

(i) For each integer $m\ge 1$, define the truncated stopping time \(T^{(m)} := T\wedge m.\) Show that \(E[S_{T^{(m)}}] = \mu\,E[T^{(m)}].\)

(ii) Use (i) to prove that
\(E[S_T] = \mu\,E[T]\) still holds (under appropriate finiteness assumptions). Hint: show first that $T^{(m)}\uparrow T$, then use part (a)(ii) for the bounded stopping time $T^{(m)}$.


Part (a)

(a)(i)

Claim.

For every $n\ge 0$, \(S_{(n+1)\wedge T} - S_{n\wedge T} = X_{n+1}\,\mathbf{1}_{\{T>n\}}\quad\text{a.s.}\)

Proof.

This is exactly the “marginal difference” of the sum between steps $n$ and $n+1$, under a stopping time.

Recall \(S_n = \sum_{k=1}^n X_k.\)

We split into two cases.

  • Case 1: $T \le n$.
    Then $(n+1)\wedge T = T$ and $n\wedge T = T$, so \(S_{(n+1)\wedge T} - S_{n\wedge T} = S_T - S_T = 0.\) Also $\mathbf{1}_{{T>n}} = 0$, so the right-hand side is \(X_{n+1}\,\mathbf{1}_{\{T>n\}} = 0.\)

  • Case 2: $T > n$.
    Then $n\wedge T = n$ and $(n+1)\wedge T = n+1$, hence \(S_{(n+1)\wedge T} - S_{n\wedge T} = S_{n+1} - S_n = X_{n+1}.\) In this case $\mathbf{1}{{T>n}}=1$, so RHS is $X{n+1}$.

In both cases, the identity holds. We can encode the two cases with indicators: \(S_{(n+1)\wedge T}-S_{n\wedge T} = X_{n+1}\,\mathbf{1}_{\{T>n\}}.\)

Conclusion.

The increment of the stopped sum from step $n$ to $n+1$ is exactly the next increment $X_{n+1}$, times the indicator that we have not yet stopped at time $n$.

Key Takeaways

  • Think of this as the incremental step of a summation under a stopping rule.
  • Stopping time logic: either we are already stopped (increment 0) or not (increment is $X_{n+1}$).
  • Indicators are a clean way to encode case-by-case reasoning.

(a)(ii)

Claim.

We have the telescoping representation \(S_T = \sum_{n=0}^\infty \big(S_{(n+1)\wedge T} - S_{n\wedge T}\big) = \sum_{n=0}^\infty X_{n+1}\,\mathbf{1}_{\{T>n\}} = \sum_{k=1}^\infty X_k\,\mathbf{1}_{\{T\ge k\}}.\)

Proof.

First write the telescoping sum: \(\sum_{n=0}^{N-1} \big(S_{(n+1)\wedge T} - S_{n\wedge T}\big) = S_{N\wedge T} - S_{0\wedge T} = S_{N\wedge T}.\)

Now let $N\to\infty$. Since $T<\infty$ a.s., we have $N\wedge T \uparrow T$ as $N\to\infty$, hence \(S_{N\wedge T} \to S_T\quad\text{a.s.}\) On the other hand, using (a)(i), \(S_{(n+1)\wedge T} - S_{n\wedge T} = X_{n+1}\,\mathbf{1}_{\{T>n\}}.\)

So, \(S_T = \sum_{n=0}^\infty X_{n+1}\,\mathbf{1}_{\{T>n\}}.\)

Reindex with $k = n+1$: \(S_T = \sum_{k=1}^\infty X_k\,\mathbf{1}_{\{T\ge k\}}.\) (Notice ${T>n} = {T\ge n+1}$.)

This is the classical telescoping sum picture: writing out the first several terms for large and small $T$, all the middle differences cancel and only the “block” up to $T$ remains.

Conclusion.

The sum up to the stopping time $T$ is the infinite sum of “incremental steps” of the stopped process, which is also the sum of $X_k$ multiplied by the indicator that we have not yet stopped before time $k$.

Key Takeaways

  • Telescoping sums under stopping times give very clean representations like \(S_T = \sum_k X_k 1_{\{T\ge k\}}.\)
  • This representation is the starting point for Wald’s identity: expectations become sums of expectations of indicators.
  • Translating “what happens between two consecutive sums” into indicator language is extremely powerful.

Part (b)

(b)(i)

Claim.

Assume $E[T]<\infty$ and $E[X_k]=\mu$ for all $k$. Then \(E[S_T] = \mu\,E[T].\)

(This may be finite or $+\infty$; both sides match.)

Proof.

Start from the representation in (a)(ii): \(S_T = \sum_{k=1}^\infty X_k \mathbf{1}_{\{T\ge k\}}.\)

Since $T<\infty$ a.s., only finitely many indicators are nonzero a.s., so the sum is a.s. finite. Using Fubini/monotone convergence for nonnegative terms (or dominated convergence under integrability), we can interchange sum and expectation: \(E[S_T] = E\left[\sum_{k=1}^\infty X_k \mathbf{1}_{\{T\ge k\}}\right] = \sum_{k=1}^\infty E[X_k \mathbf{1}_{\{T\ge k\}}].\)

Because $(X_k)$ and $T$ are independent, \(E[X_k \mathbf{1}_{\{T\ge k\}}] = E[X_k] \, P(T\ge k) = \mu\,P(T\ge k).\)

Thus \(E[S_T] = \sum_{k=1}^\infty \mu P(T\ge k) = \mu \sum_{k=1}^\infty P(T\ge k).\)

But for a nonnegative integer-valued random variable $T$, \(E[T] = \sum_{k=1}^\infty P(T\ge k).\)

Therefore \(E[S_T] = \mu\,E[T].\)

Conclusion.

Under $E[T]<\infty$ and independence, the expected sum up to a stopping time is the mean times the expected stopping time: this is Wald’s identity.

Key Takeaways

  • The key formula is \(E[T] = \sum_{k\ge1} P(T\ge k),\) which is just expectation via tail sums for nonnegative integer-valued $T$.
  • Independence lets you split \(E[X_k\,1_{\{T\ge k\}}] = E[X_k]P(T\ge k).\)
  • Wald’s identity is nothing but linearity of expectation plus the tail-sum formula for $T$.

(b)(ii) Example with Bernoulli and Negative Binomial

Claim.

Let $(X_k)$ be i.i.d. Bernoulli$(p)$, and define \(T = \inf\Big\{n\ge 1: \sum_{k=1}^n X_k = 2\Big\}.\) Then:

  1. $T$ has a Negative Binomial distribution with parameters $(r=2,p)$,
  2. $E[T] = \dfrac{2}{p}$.

Proof.

Distribution of $T$.
To have $T=n$, we need:

  • exactly 1 success in the first $n-1$ trials, and
  • a success on trial $n$.

Thus \(P(T=n) = \binom{n-1}{1} p^2 (1-p)^{n-2},\quad n=2,3,\dots\) which is the pmf of a Negative Binomial$(r=2,p)$: number of trials to obtain 2 successes.

Expectation.
For a Negative Binomial$(r,p)$, \(E[T] = \frac{r}{p}.\) So here \(E[T] = \frac{2}{p}.\)

Alternatively, use Wald’s identity:

  • $S_T = \sum_{k=1}^T X_k = 2$ by definition of $T$.
  • $E[S_T] = E[2] = 2$.

But also from (b)(i), \(E[S_T] = \mu E[T], \quad \mu = E[X_1] = p.\) So \(2 = p\,E[T] \quad\Rightarrow\quad E[T] = \frac{2}{p}.\)

Conclusion.

The stopping time “number of trials to get 2 successes” is Negative Binomial, with mean $2/p$. Wald’s identity gives a quick way to compute this mean without summing the series explicitly.

Key Takeaways

  • “Number of trials to get $r$ successes” $\Rightarrow$ Negative Binomial distribution.
  • Wald’s identity can be used in reverse: if you know the sum at the stopping time, you can solve for $E[T]$.
  • This is a classic “counting trials until success” example, useful to keep in mind for prelims.

Part (c)

Now we drop the explicit assumption $E[T]<\infty$. Instead, we assume:

  • $E X_1 <\infty$,
  • $\frac{S_n}{n} \to \mu$ a.s. (so $S_n$ behaves like $\mu n$ for large $n$).

The idea is to approximate $T$ by bounded stopping times $T^{(m)} = T\wedge m$, prove Wald’s identity for each $T^{(m)}$, then pass to the limit.


(c)(i)

Claim.

For each $m\ge 1$, \(E[S_{T^{(m)}}] = \mu\,E[T^{(m)}].\)

Proof.

For fixed $m$, $T^{(m)} = T\wedge m$ is bounded by $m$, so in particular \(E[T^{(m)}] \le m < \infty.\) We can therefore apply part (b)(i) to the stopping time $T^{(m)}$ instead of $T$.

Using the representation from (a)(ii) with $T^{(m)}$, \(S_{T^{(m)}} = \sum_{k=1}^\infty X_k\,\mathbf{1}_{\{T^{(m)}\ge k\}} = \sum_{k=1}^m X_k\,\mathbf{1}_{\{T\ge k\}},\) since $T^{(m)}\ge k$ iff both $k\le m$ and $T\ge k$.

Repeating the same argument as in (b)(i), \(E[S_{T^{(m)}}] = \sum_{k=1}^m E[X_k 1_{\{T\ge k\}}] = \sum_{k=1}^m E[X_k] P(T\ge k) = \mu \sum_{k=1}^m P(T\ge k) = \mu E[T^{(m)}].\)

Conclusion.

Even when we are not sure about $E[T]$, Wald’s identity holds for every bounded truncation $T^{(m)}$.

Key Takeaways

  • Truncation is a standard trick: replace $T$ by $T^{(m)}$ to get finiteness automatically.
  • All previous arguments for bounded stopping times go through unchanged.
  • This sets up a limit argument to recover results for the original $T$.

(c)(ii)

Claim.

Under the given assumptions, one can still conclude \(E[S_T] = \mu\,E[T]\) whenever both sides are finite (or both are $+\infty$).

Proof.

From (c)(i), for each $m$, \(E[S_{T^{(m)}}] = \mu E[T^{(m)}].\)

Note that $T^{(m)} \uparrow T$ as $m\to\infty$. Also \(S_{T^{(m)}} = \sum_{k=1}^{T^{(m)}} X_k \uparrow \sum_{k=1}^T X_k = S_T\) pointwise on the event ${T<\infty}$.

The precise justification of passing the limit inside $E[\cdot]$ depends on finiteness assumptions:

  • if $E[T]<\infty$ and $E X_1 <\infty$, then one can dominate $ S_{T^{(m)}} $ by something integrable (e.g. use the growth of $S_n/n\to\mu$),
  • or use monotone convergence in the nonnegative case.

Assuming the necessary integrability so that we are allowed to interchange limit and expectation, we get: \(\lim_{m\to\infty} E[S_{T^{(m)}}] = E[S_T], \quad \lim_{m\to\infty} E[T^{(m)}] = E[T].\)

Taking limits in \(E[S_{T^{(m)}}] = \mu E[T^{(m)}]\) yields \(E[S_T] = \mu E[T].\)

Conclusion.

By truncating the stopping time, applying Wald’s identity to each truncation, and carefully passing to the limit, we recover Wald’s identity for the original stopping time under weaker assumptions.

Key Takeaways

  • When full conditions are unclear, truncate first, prove results for the truncated object, then take limits.
  • Bounded stopping times are much easier to handle.
  • This is a standard pattern:
    • Use $T^{(m)} = T\wedge m$.
    • Prove statement for each $T^{(m)}$.
    • Pass to the limit $m\to\infty$ using monotone/dominated convergence.

Global Key Takeaways for Question 5

  • This entire problem is a structured derivation of Wald’s identity: \(E\Big[\sum_{k=1}^T X_k\Big] = E[X_1] E[T],\) under appropriate assumptions.
  • Core tools:
    • Stopping-time representation:
      $S_T = \sum_k X_k 1_{{T\ge k}}$.
    • Tail-sum formula for nonnegative integer-valued random variables: $E[T] = \sum_k P(T\ge k)$.
    • Independence of $(X_k)$ and $T$ to factor expectations.
    • Borel–Cantelli / truncations if you need to control bad events or unbounded stopping times.
  • Philosophically:
    “Sum up to a random time = sum over all times, weighted by whether we have not stopped yet.”

This is a very high-yield pattern for prelims: if you see a sum up to a stopping time and independence, think Wald’s identity and the indicator decomposition from part (a).

Comments