2025-04-28 (Edited: 2025-08-15)

Probability

Table of Contents

1. James Stein Estimator

Let \(y = \theta + \sigma \epsilon\) , where \(e \sim \mathcal{N}(0, \sigma^2\mathbf{I})\) is d-dim.

James-Stein estimator gives lower MSE than maximum likelihood estimator (for \(d \geq 3\))

\[ \hat{\theta}_{\text{JS}} = \left(1 - \frac{(d-2)\sigma^2}{\|Y\|^2}\right) Y \]

2. Central Limit Theorem

There is an analogue of the CLT for extreme values. The Fisher–Tippett–Gnedenko theorem is the extreme-values analogue of the CLT: if the properly normalized maximum of an i.i.d. sample converges, it must be Gumbel, Fréchet, or Weibull—unified as the Generalized Extreme Value distribution. Unlike the CLT, whose assumptions (in my experience) rarely hold in practice, this result is extremely general and underpins methods like wavelet thresholding and signal denoising—easy to demonstrate with a quick simulation. [Source]

For normal distribution there's the 68-95-95.7 rule. There is 68% probability that X is inside \(\sigma\), 95% probability that it is inside \(2\sigma\) and so on.

2.1. Cantelli's Inequality

The probability mass higher than \(k\) standard deviations is at most:

\begin{align*} P(X > \mu + k \sigma) \leq \frac 1 { 1 + k^2} \end{align*}

This is a tighter bound than Chebyshev's Inequality because it bounds on one side only. Both of these apply to any distribution where variance exists.

The above gives the 50 - 20 - 10 rule.

2.2. Chebyshev's Inequality

\begin{align*} P(|X - \mu | < k \sigma) \leq \frac 1 { k^2} \end{align*}

The above gives the 0 - 75 - 89 Rule.


Backlinks


You can send your feedback, queries here