2025-04-28 (Edited: 2025-08-15)

Probability

Table of Contents

1. Definitions

We need a Probability Space (aka Measure Space) to rigorously define probability. For discrete distributions we can still get away without using Measure theory but for continuous distributions if we are not rigorous we can get into trouble.

Exclusive Events:

Iff, \(E_1 \cap E_2 = \phi\)

1.1. Independence

Independent Events:

Iff, \(P(E_1 \cap E_2) = P(E_1) \times P(E_2)\)

Independence is a quality we assume based on the problem. It doesn't derive from other things.

Independence is stronger condition than covariance:

  • Independent variables have zero covariance
  • But zero covariance only implies there is no linear dependence.

Mutual Independence:

Set of events \(E = \{E_1, E_2, ... E_k\}\) are mutually independent iff,

\[ \forall S \subset E, \ \ P \left( \bigcap_{E_i \in S} E_i \right) = \prod_{E_i \in S} P(E_i) \]

Mutual independence is stronger condition than pairwise independence. It is possible to have pairwise independent events that are not mutually independent.

For an example see Page 48 of Probability, Random Variables and Stochastic Processes - Athanasios Papoulis [#]

1.2. Random Variable

A random variable \(X\) is a function \(X: \Omega \to \mathbb{R}\). A random variable is neither random, nor a variable, hence a misnomer. It is a deterministic mapping from sample space to real line.

2. Frequentists vs Bayesian of Probability

Frequentists view probability as the outcomes/frequencies of random, repeatable events.

Bayesian view of probability is that of quantification of uncertainty. If numerical values are used to represent degrees of belief, then the set of axioms that make common sense lead exactly to the rules of probability. Thus probability theory could be regarded as an extension of Boolean logic to situations involving uncertainty. Thus, such number used for uncertainty can be refered to as (Bayesian) probabilities. [#]

In other words, Probability is a way to express uncertainity of statements and an extension of logic to deal with uncertainity.

It was developed to analyze frequencies of events. But when the events are not repeatable, we still can use probability theory to represent a degree of belief. The former is called frequentist probability and the later is called Bayesian probability. It turns out that both of them follows the same rules/formulas of probability theory.

3. Conditional Probability

  • \(x \perp y\) - Denotes independence
  • \(x \perp y | z\) - Denotes conditional independence

3.1. Chain Rule of Conditional Probabilities

Also known as Product rule Conditional Probabilities.

Joint probability can expressed product of conditional probabilities.

\begin{align*} P(x^{(1)}, x^{(2)}, ..., x^{(n)}) = P(x^{(1)}) \Pi_{i=n}^2 P(x^{(i)}| x^{(1)}, ..., x^{(i-1)}) \end{align*}

4. Transformation of Variables

If \(x\) and \(y\) are continuous random variables and \(g\) is a invertible, continuous, differentiable function then \(y = g(x)\) doesn't imply \(p_y(y) = p_x(g^{-1}(y))\) instead it is:

\begin{align*} p_y(y) = p_x(g^{-1}(y)) \left| \frac{\partial x}{\partial y} \right| \end{align*}

For multivariate function we need to take determinant of the Jacobian matrix.

5. See also

6. James Stein Estimator

Let \(y = \theta + \sigma \epsilon\) , where \(e \sim \mathcal{N}(0, \sigma^2\mathbf{I})\) is d-dim.

James-Stein estimator gives lower MSE than maximum likelihood estimator (for \(d \geq 3\))

\[ \hat{\theta}_{\text{JS}} = \left(1 - \frac{(d-2)\sigma^2}{\|Y\|^2}\right) Y \]


Backlinks


You can send your feedback, queries here