Probability
Table of Contents
1. Definitions
We need a Probability Space (aka Measure Space) to rigorously define probability.
Exclusive Events:
Iff, \(E_1 \cap E_2 = \phi\)
Independent Events:
Iff, \(P(E_1 \cap E_2) = P(E_1) \times P(E_2)\)
Independence is a quality we assume based on the problem. It doesn't derive from other things.
Mutual Independence:
Set of events \(E = \{E_1, E_2, ... E_k\}\) are mutually independent iff,
\[ \forall S \subset E, \ \ P \left( \bigcap_{E_i \in S} E_i \right) = \prod_{E_i \in S} P(E_i) \]
Mutual independence is stronger condition than pairwise independence. It is possible to have pairwise independent events that are not mutually independent.
Random Variable:
A random variable \(X\) is a function \(X: \Omega \to \mathbb{R}\). A random variable is neither random, nor a variable, hence a misnomer. It is a deterministic mapping from sample space to real line.
2. Frequentists vs Bayesian of Probability
Frequentists view probability as the outcomes/frequencies of random, repeatable events.
Bayesian view of probability is that of quantification of uncertainty. If numerical values are used to represent degrees of belief, then the set of axioms that make common sense lead exactly to the rules of probability. Thus probability theory could be regarded as an extension of Boolean logic to situations involving uncertainty. Thus, such number used for uncertainty can be refered to as (Bayesian) probabilities. [#]
4. James Stein Estimator
Let \(y = \theta + \sigma \epsilon\) , where \(e \sim \mathcal{N}(0, \sigma^2\mathbf{I})\) is d-dim.
James-Stein estimator gives lower MSE than maximum likelihood estimator (for \(d \geq 3\))
\[ \hat{\theta}_{\text{JS}} = \left(1 - \frac{(d-2)\sigma^2}{\|Y\|^2}\right) Y \]