2024.Q4 – Lindeberg Condition, CLT, and Variance Asymptotics

2024 Probability Prelim Exam (PDF)

Problem Statement (verbatim)

Let ${(X_k, Y_k)}_{k=1,2,\dots}$ be a sequence of pairs of random variables. Denote
$S_n = \sum_{k=1}^n X_k,\quad T_n = \sum_{k=1}^n Y_k,\quad n = 1,2,\dots$

a. Assume that $\sum_{k=1}^\infty P(X_k \ne Y_k) < \infty$, and let $a_n \to \infty$. Prove that if $\frac{T_n}{a_n}$ converges in distribution to $W$, then $\frac{S_n}{a_n}$ converges in distribution to $W$ as well.

b. From now on assume that ${X_k}{k=1,2,\dots}$ are independent and $X_1 = 0,\quad X_k = \begin{cases} \pm 1 & \text{with probability } \frac{1}{2} - \frac{1}{2k^2},\\ \pm k & \text{with probability } \frac{1}{2k^2}, \end{cases} \quad k = 2,3,\dots$ Let $Y_0 = 0$, $Y_k = X_k\cdot \mathbf{1}{{X_k=\pm 1}}, k = 1,2,\dots$. Prove that $\frac{\mathrm{Var}(S_n)}{2n} \longrightarrow 1 \quad\text{and}\quad \frac{\mathrm{Var}(T_n)}{n} \longrightarrow 1.$

c. (i) Does the triangular array $\Big\{\tfrac{X_k}{\sqrt{2n}}\Big\}_{k=1,\dots,n,\;n=1,2,\dots}$ satisfy Lindeberg condition? What about the triangular array $\Big\{\tfrac{Y_k}{\sqrt{n}}\Big\}_{k=1,\dots,n,\;n=1,2,\dots}?$

(ii) Prove that $\sqrt{\tfrac{S_n}{n}}$ converges in distribution to $N(0,1)$.

Part (a)

Claim.

Assume $\sum_{k=1}^\infty P(X_k\ne Y_k) < \infty$ and $a_n\to\infty$.
If $\dfrac{T_n}{a_n} \xrightarrow{d} W$, then $\dfrac{S_n}{a_n} \xrightarrow{d} W$ as well.

Proof.

Let $A_k = \{X_k \ne Y_k\}.$ Since $\sum_{k=1}^\infty P(A_k) < \infty,$ by the first Borel–Cantelli lemma we have $P(A_k \text{ i.o.}) = 0.$ So, with probability 1, there exists a (random) index $K(\omega)$ such that
$X_k(\omega) = Y_k(\omega)$ for all $k \ge K(\omega)$.

Define the (almost surely finite) random variable $D(\omega) = \sum_{k=1}^\infty \big(X_k(\omega) - Y_k(\omega)\big).$ This is a finite sum because all but finitely many terms are zero a.s.

For each finite $n$, $S_n - T_n = \sum_{k=1}^n (X_k - Y_k).$ For $n \ge K(\omega)$, all terms with $k > K(\omega)$ vanish, so $S_n(\omega) - T_n(\omega) = \sum_{k=1}^\infty (X_k(\omega) - Y_k(\omega)) = D(\omega).$ Thus a.s. we have $S_n - T_n = \begin{cases} \text{some partial sum}, & n < K(\omega), D, & n \ge K(\omega). \end{cases}$ In particular, $|S_n - T_n| \le |D| \quad\text{a.s. for all }n.$

Since $a_n\to\infty$, $\frac{|S_n - T_n|}{a_n} \le \frac{|D|}{a_n} \xrightarrow[n\to\infty]{} 0 \quad\text{a.s.}$ Hence $\frac{S_n - T_n}{a_n} \xrightarrow{P} 0.$

Now use Slutsky’s theorem:

$\displaystyle \frac{T_n}{a_n} \xrightarrow{d} W$ by assumption,
$\displaystyle \frac{S_n - T_n}{a_n} \xrightarrow{P} 0.$

Thus $\frac{S_n}{a_n} = \frac{T_n}{a_n} + \frac{S_n - T_n}{a_n} \xrightarrow{d} W + 0 = W.$

Conclusion.

Because $\sum P(X_k\ne Y_k)<\infty$, the difference $S_n - T_n$ is eventually constant a.s., and the normalized difference tends to 0. By Slutsky, $S_n/a_n$ has the same limit in distribution as $T_n/a_n$.

Key Takeaways

Borel–Cantelli (1): if $\sum P(A_k)<\infty$, then $A_k$ occurs only finitely many times a.s.
That gives “eventual equality” between two processes, so their difference becomes a finite a.s. constant.
Dividing a bounded (or finite a.s.) random variable by $a_n\to\infty$ gives convergence to 0, at least in probability.
Slutsky’s theorem: small perturbations (in probability) do not affect the distributional limit.

Part (b)

Claim.

With $S_n = \sum_{k=1}^n X_k$ and $T_n = \sum_{k=1}^n Y_k$, $\frac{\mathrm{Var}(S_n)}{2n} \longrightarrow 1 \quad\text{and}\quad \frac{\mathrm{Var}(T_n)}{n} \longrightarrow 1.$

Proof.

The $X_k$ are independent, so $\mathrm{Var}(S_n) = \sum_{k=1}^n \mathrm{Var}(X_k).$

For $k=1$, we are given $X_1=0$, so $\mathrm{Var}(X_1)=0$.
For $k\ge2$, we have $X_k = \begin{cases} \pm 1 & \text{with prob } \frac{1}{2} - \frac{1}{2k^2},\\ \pm k & \text{with prob } \frac{1}{2k^2}. \end{cases}$ The distribution is symmetric, so $E[X_k]=0$.

Then $$ E[X_k^2] = 1^2 \cdot 2\left(\frac{1}{2} - \frac{1}{2k^2}\right)
- k^2 \cdot 2\left(\frac{1}{2k^2}\right) = \left(1 - \frac{1}{k^2}\right) + 1 = 2 - \frac{1}{k^2}. $$ Hence $\mathrm{Var}(X_k)=E[X_k^2]=2-\frac1{k^2}$.

So $\mathrm{Var}(S_n) = \sum_{k=2}^n \left(2 - \frac{1}{k^2}\right) = 2(n-1) - \sum_{k=2}^n \frac{1}{k^2}.$ Rewriting, $\mathrm{Var}(S_n) = 2n - 2 - \sum_{k=2}^n \frac{1}{k^2} = 2n - 1 - \sum_{k=1}^n \frac{1}{k^2}.$

Now $\lim_{n\to\infty} \sum_{k=1}^n \frac{1}{k^2} = \frac{\pi^2}{6} < \infty,$ so $$ \lim_{n\to\infty} \frac{\mathrm{Var}(S_n)}{2n} = \lim_{n\to\infty} \left( 1 - \frac{1}{2n}

\frac{1}{2n} \sum_{k=1}^n \frac{1}{k^2} \right) = 1. $$

Next, consider $T_n = \sum_{k=1}^n Y_k$, where $Y_k = X_k \cdot \mathbf{1}_{\{X_k=\pm1\}}.$ Then:

$Y_1 = X_1\cdot 1_{{X_1=\pm1}}=0$, so $\mathrm{Var}(Y_1)=0$.
For $k\ge2$, $Y_k$ takes values $\pm1$ with total probability $1 - 1/k^2$, and is 0 otherwise. Again symmetry gives $E[Y_k]=0$, and $E[Y_k^2] = 1^2 \cdot 2\left(\frac{1}{2} - \frac{1}{2k^2}\right) = 1 - \frac{1}{k^2}.$ So $\mathrm{Var}(Y_k)=1 - \frac{1}{k^2}$.

Then $\mathrm{Var}(T_n) = \sum_{k=2}^n \left(1 - \frac{1}{k^2}\right) = (n-1) - \sum_{k=2}^n \frac{1}{k^2} = n - \sum_{k=1}^n \frac{1}{k^2}.$

Thus $\frac{\mathrm{Var}(T_n)}{n} = 1 - \frac{1}{n}\sum_{k=1}^n \frac{1}{k^2} \longrightarrow 1.$

Conclusion.

The “big jumps” in $X_k$ ($\pm k$) happen rarely enough that the variance is asymptotically linear in $n$: $\mathrm{Var}(S_n)\sim 2n$, $\mathrm{Var}(T_n)\sim n$.

Key Takeaways

For sums of independent variables, variance is additive.
Symmetry makes $E[X_k]=0$ and simplifies variance to just $E[X_k^2]$.
Rare but large values (here $\pm k$ with prob $1/k^2$) contribute a finite adjustment to the variance growth.
Harmonic-type series $\sum 1/k^2$ converge, so they appear as $O(1)$ corrections.

Part (c)(i)

We consider two triangular arrays:

$\xi_{n,k} = \dfrac{X_k}{\sqrt{2n}},\quad k=1,\dots,n,$
$\eta_{n,k} = \dfrac{Y_k}{\sqrt{n}},\quad k=1,\dots,n.$

Let $s_n^2 := \sum_{k=1}^n \mathrm{Var}(\xi_{n,k}), \quad t_n^2 := \sum_{k=1}^n \mathrm{Var}(\eta_{n,k}).$

From part (b), $s_n^2 = \frac{\mathrm{Var}(S_n)}{2n} \to 1, \quad t_n^2 = \frac{\mathrm{Var}(T_n)}{n} \to 1.$

Recall Lindeberg’s condition:
For a triangular array ${\zeta_{n,k}}$ with total variance $v_n^2=\sum_k\mathrm{Var}(\zeta_{n,k})$, $\forall \varepsilon>0:\quad \frac{1}{v_n^2} \sum_{k=1}^n E\big[\zeta_{n,k}^2\mathbf{1}_{\{|\zeta_{n,k}|>\varepsilon\}}\big] \longrightarrow 0.$

Claim (1).

The $X_k$-array $\xi_{n,k} = X_k/\sqrt{2n}$ does not satisfy Lindeberg’s condition.

Proof.

For large $n$, the total variance $s_n^2\to1$, so we can ignore the normalization factor (it stays bounded away from 0 and ∞).

Fix $\varepsilon>0$. We examine $\sum_{k=1}^n E\left[\xi_{n,k}^2 \mathbf{1}_{\{|\xi_{n,k}|>\varepsilon\}}\right] = \sum_{k=1}^n \frac{1}{2n} E\left[X_k^2 \mathbf{1}_{\{|X_k| > \varepsilon\sqrt{2n}\}}\right].$

For each $k\ge2$, the “big” values of $X_k$ are $\pm k$, occurring with total probability $1/k^2$; otherwise, $

X_k

=1$. For large $n$, the condition $

X_k

> \varepsilon\sqrt{2n}$ will be satisfied precisely when:

$X_k = \pm k$ and
$k > \varepsilon\sqrt{2n}$.

Thus, for large $n$, $E\left[X_k^2 \mathbf{1}_{\{|X_k| > \varepsilon\sqrt{2n}\}}\right] = k^2\cdot P(|X_k|=k,\;k>\varepsilon\sqrt{2n}) = k^2\cdot \frac{1}{k^2}\mathbf{1}_{\{k>\varepsilon\sqrt{2n}\}} = \mathbf{1}_{\{k>\varepsilon\sqrt{2n}\}}.$

Hence $\sum_{k=1}^n E\left[\xi_{n,k}^2 \mathbf{1}_{\{|\xi_{n,k}|>\varepsilon\}}\right] \approx \frac{1}{2n}\sum_{k>\varepsilon\sqrt{2n}}^n 1 = \frac{1}{2n} \big(n - \lfloor \varepsilon\sqrt{2n}\rfloor\big) \longrightarrow \frac12 \ne 0.$

Dividing by $s_n^2\to1$ does not change this limit. Therefore the Lindeberg expression does not go to 0, so the Lindeberg condition fails for ${\xi_{n,k}}$.

Conclusion (1).

The triangular array ${X_k/\sqrt{2n}}$ fails Lindeberg’s condition because the rare “big jumps” of size $k$ contribute a non-vanishing amount to the Lindeberg sum.

Claim (2).

The $Y_k$-array $\eta_{n,k} = Y_k/\sqrt{n}$ does satisfy Lindeberg’s condition.

Proof.

Recall $Y_k \in {-1,0,1}$. Therefore, $|\eta_{n,k}| = \frac{|Y_k|}{\sqrt{n}} \le \frac{1}{\sqrt{n}}.$

Fix $\varepsilon>0$. For all sufficiently large $n$, $\frac{1}{\sqrt{n}} < \varepsilon \quad\Rightarrow\quad |\eta_{n,k}| \le \varepsilon\quad\text{for all }k=1,\dots,n.$ This means the indicator $\mathbf{1}{{|\eta{n,k}|>\varepsilon}}$ is identically zero for all large $n$. Thus $\sum_{k=1}^n E\left[\eta_{n,k}^2 \mathbf{1}_{\{|\eta_{n,k}|>\varepsilon\}}\right] = 0$ for large enough $n$, and after dividing by $t_n^2\to1$ we still get 0.

So the Lindeberg condition holds for ${\eta_{n,k}}$.

Conclusion (2).

The triangular array ${Y_k/\sqrt{n}}$ satisfies Lindeberg’s condition because the individual normalized terms become uniformly small as $n\to\infty$.

Key Takeaways

Lindeberg condition detects whether rare large jumps are negligible under the chosen normalization.
For $X_k/\sqrt{2n}$, the rare ±k jumps are still “large enough” to violate Lindeberg.
For $Y_k/\sqrt{n}$, the values are always in $[-1,1]$, so after dividing by $\sqrt{n}$ they are uniformly tiny, and the Lindeberg condition is automatically satisfied.

Part (c)(ii)

Claim.

$\sqrt{\frac{S_n}{n}} \;\xrightarrow{d}\; N(0,1).$

(Equivalently, $\frac{S_n}{\sqrt{n}} \xrightarrow{d} N(0,1)$.)

Proof.

We already know from part (b) that $\mathrm{Var}(T_n) \sim n,$ and from part (c)(i) that the triangular array $\eta_{n,k} = \frac{Y_k}{\sqrt{n}}$ satisfies Lindeberg’s condition, with total variance $t_n^2 \to 1$.

By the Lindeberg–Feller CLT for triangular arrays, $\frac{T_n}{\sqrt{n}} \xrightarrow{d} N(0,1).$

Now write $S_n = T_n + D_n,\quad\text{where}\quad D_n := \sum_{k=2}^n X_k \mathbf{1}_{\{|X_k|=k\}}$ collects the “big jump” contributions.

Note $P(|X_k|=k) = \frac{1}{k^2},$ so $\sum_{k=2}^\infty P(|X_k|=k) = \sum_{k=2}^\infty \frac{1}{k^2} < \infty.$ By the Borel–Cantelli lemma (1), $P(|X_k|=k \text{ i.o.}) = 0.$ Thus only finitely many of the events ${|X_k|=k}$ occur a.s., and the series $D_\infty := \sum_{k=2}^\infty X_k \mathbf{1}_{\{|X_k|=k\}}$ converges a.s. to a finite (random) limit. In particular, $D_n\to D_\infty$ a.s., hence $\frac{D_n}{\sqrt{n}} \xrightarrow{P} 0.$

Now $\frac{S_n}{\sqrt{n}} = \frac{T_n}{\sqrt{n}} + \frac{D_n}{\sqrt{n}}.$

We already have $\frac{T_n}{\sqrt{n}} \xrightarrow{d} N(0,1)$ and $\frac{D_n}{\sqrt{n}} \xrightarrow{P} 0$. By Slutsky’s theorem, $\frac{S_n}{\sqrt{n}} \xrightarrow{d} N(0,1).$

Conclusion.

Even though $X_k$ occasionally takes large values ±k, those large jumps occur so rarely that after normalization by $\sqrt{n}$, they vanish in probability. The main contribution is from the “±1” part captured by $T_n$, and thus $S_n/\sqrt{n}$ also converges in distribution to $N(0,1)$.

Key Takeaways

Strategy:
1. Isolate the “nice” part $T_n$ that satisfies Lindeberg and has variance $\sim n$.
2. Show the “bad” part $D_n$ contributes only a negligible term after normalization.
Rare but large values can be handled with Borel–Cantelli and then Slutsky.
This is a classic pattern:
- decompose $S_n = \text{nice part} + \text{rare spikes}$,
- prove CLT for the nice part,
- show rare spikes are negligible.

Global Key Takeaways for Question 4

Part (a): Using Borel–Cantelli to show eventual equality of two processes, then Slutsky to transfer limits.
Part (b): Computing variances explicitly to see linear growth, with a convergent correction term.
Part (c)(i): Understanding Lindeberg’s condition as “big jumps vanish in aggregate” under normalization.
Part (c)(ii): Combining Lindeberg–Feller CLT, Borel–Cantelli, and Slutsky to handle heavy tails in a controlled way.

This problem is a great template for:

how to compare two sums,
how to separate “nice” and “rare spike” parts of a sequence,
and how to assemble multiple probabilistic tools into one coherent CLT proof.

Edit this page on GitHub

2024.Q4 – Lindeberg Condition, CLT, and Variance Asymptotics

Problem Statement (verbatim)

Part (a)

Claim.

Proof.

Conclusion.

Key Takeaways

Part (b)

Claim.

Proof.

Conclusion.

Key Takeaways

Part (c)(i)

Claim (1).

Proof.

Conclusion (1).

Claim (2).

Proof.

Conclusion (2).

Key Takeaways

Part (c)(ii)

Claim.

Proof.

Conclusion.

Key Takeaways

Global Key Takeaways for Question 4

Comments