Probability basics - Huang Xiao

# Sample space A sample space $\Omega$ is defined as a finite set of possible outcome of an experiment, and its complement $\Omega^{c}$ is simply its opposite. # Axioms of probability A function $\mathbb{P}$ assigns a real number on event $A$ as $\mathbb{P}(A)$, which is a probability function or probability measure if it satisfies following three axioms, 1. $\mathbb{P}(A) \ge 0$ 2. $\mathbb{P}(\Omega)=1$, where $\Omega$ is the entire sample space for event $A$ 3. If events $A_{1}, A_{2}\ldots$ are disjoint, $\mathbb{P}(\bigcup_{i=1}^{\infty})=\sum_{i=1}^{\infty}(A_i)$ # Counting theory Choose $k$ out of $n$ can be calculated as $C_n^k=\frac{n!}{(n-k)!k!}=C_n^{n-k}$and $C_n^0=C_n^n=1$. # Geometric sequence Let us take a look at geometric sequence, which is defined as $\sum\limits_{n=0}^{N}ar^n=a+ar+ar^2+\cdots+ar^N=\frac{a(1-r^N)}{1-r}$ Apparently, when $-1<r<1$ and $N \to \infty$, we have $\sum\limits_{n=0}^{N}ar^n = a/(1-r)$. # Conditional probability If event $P(B)>0$ , so the conditional probability of event $P(A\,|\,B)$ is defined as, $P(A\,|\,B)= \frac{P(AB)}{P(B)}$It is simply the fraction of event $A$ and $B$ occurs in the event of $B$ occurs. # The law of total probability Let $\Omega =\{A_{1},A_{2},\ldots, A_{k}\}$ be the partition of sample space $\Omega$, we have $P(B)=\sum\limits_{k}P(B \,|\, A_k)$. # Bayes' theorem One of the most important theorems in stats. Let $A_1,A_{2}, \ldots, A_k$ be the partition of $\Omega$, such that $\mathbb{P}(A_i)\gt 0$ for each $i$. If $\mathbb{P}(B) \gt 0$ then, for each $i=1, \ldots, k$, $\mathbb{P}(A_{i}\,|\,B)= \frac{\mathbb{P}(B \,|\, A_i)\mathbb{P}(A_i)}{\sum\limits_{j=1}^{k}\mathbb{P}(B\,|\,A_j)\mathbb{P}(A_j)}$ We call $\mathbb{P}(A_i)$ the prior and $\mathbb{P}(A_i\,|\,B)$ is called posterior. You can see this by applying the law of total probability on the denominator, move it back to the left, then apply conditional probability. # Random variable We denote RV as random variable. Probability mass function (PMF) is for discrete RV, while probability density function (PDF) if for continuous RV, it is unbounded. A valid PDF or PMF must be $1$ by integration over all support. ## Quantile function Inverse cumulative density function (CDF) is also called quantile function, * $F^{-1}(q)=\inf(x: F(x)\gt q), \,\,\,\, 0\leq q\leq 1$ * It represents the minimal value $x$ for the CDF to be at least $q$. * 1st quantile $F^{-1}(1/4)$, 2nd quantile or median $F^{-1}(1/2)$, 3rd quantile $F^{-1}(3/4)$ ## Some important discrete RVs - Point mass distribution: $X\sim \delta_a$, $P(x=a)=1,$ and 0 otherwise. - Discrete uniform distribution: $f(x)=\begin{cases}1/k, & \text{for } x=1,2,\ldots,k \\ 0, & \text{otherwise}\end{cases}$ - Bernoulli distribution with $0\leq p \leq 1$, and $x=0 \text{ or } 1$: $f(x)\begin{cases} p, & x=1 \\ 1-p, & x=0 \end{cases}$ - Binomial distribution (use $p$ as in Bernoulli): $f(x, n, k)= \begin{pmatrix} n \\ k \end{pmatrix}p^k(1-p)^{n-k}$ - Geometric distribution (probability of $x$ occurs at step $k$): $P(x=k)=p(1-p)^k$ - Poisson distribution: $f(x)=e^{-\lambda}\frac{\lambda^x}{x!},\,\,\, x\geq0$ ## Some important continuous RVs - Uniform distribution: $f(x)=\begin{cases}\frac{1}{b-a}, & a\leq x \leq b \\ 0, & \text{otherwise} \end{cases}$and $F(x)=\begin{cases}0, & x\lt a \\ \frac{x-a}{b-a}, & a \leq x \leq b \\ 0, & x \gt b\end{cases}$ - Normal distribution: see [[Selected probability distributions]] - Exponential distribution: $X\sim \text{Exp}(\beta)$ and $f(x, \beta)=\frac{1}{\beta}e^{-x/\beta}$, to model life time of electronics components and waiting times between rare events - Now, take a look at Gamma function: $\Gamma(a)=\int_0^\infty y^{a-1}e^{-y}dy$. If $a$ is an integer, this equals the factorial $a!$. - Gamma distribution: $f(x,\alpha, \beta)=\frac{1}{\beta^\alpha\Gamma(\alpha)}x^{\alpha-1}e^{-x/\beta}$ - Student-t with degree of freedom $\nu$: $f(x)=\frac{\Gamma(\frac{\nu+1}{2})}{\Gamma(\frac{\nu}{2})}\frac{1}{(1+\frac{x^2}{\nu})^{(\nu+1)/2}}$ - Cauchy distribution: when $\nu=1$, student-t reduces to Cauchy distribution, which is $f(x)=\frac{1}{\pi(1+x^2)}$ - $\chi^2$ distribution with $p$ degree of freedom $f(x)=\frac{1}{\Gamma(p/2)2^{p/2}}x^{(p/2)-1}e^{-x/2}, \,\, x\gt 0$If $z_i \sim N(0,1)$ are independent, $\sum_i z_i^2\sim \chi_p^2$ - Marginal of a multinomial distribution is a binomial distribution ## Momentum Momentum is defined as $\mathbb{E}(x^k)$ assuming that $\mathbb{E}(|x|^k)\lt \infty$ ## Kullback-Leibler distance Given $f$ and $g$ as PDF, the Kullback-Leibler distance is defined as a relative entropy,$D_{KL}(f,g)=\int f(x)\log(\frac{f(x)}{g(x)})dx$ Note this is not a distance in a formal sense as it is not symmetric. It can be seen $D_{KL}(f,g)\geq 0$ and $D_{KL}=0$ if $f=g$.