Assume we have a coin with probability q of obtaining a head (a Bernouilli experiment). We will look at q as a random variable that needs to be estimated given an observed dataset and combining our prior belief in what values q could have.
Initially, q follows our prior distribution. It represents our knowledge about q before we start flipping the coin. We will update this distribution after each trial to obtain a posterior distribution.
For example, assume that q is a uniform random variable on the interval [0, 1]. for all q, $P(q)=1$.
We now run an experiment - flip our coin n times. We note $x_i$ the outcome of the i-th flip (0 for tail, 1 for head).
What is the probability distribution of q knowing the observations $x_i$?
Bayes' formula allows us to compute the posterior distribution analytically.
%matplotlib inline
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
posterior = lambda n, h, q: (n+1) * st.binom(n, q).pmf(h)
mu = 1.0
sigma = 0.1
q = np.linspace(0., 2.0 * mu, 1000)
f = lambda x: norm(mu, sigma).pdf(x)
fq = f(q)
plt.plot(q, fq, '-k')
plt.xlabel('x parameter')
plt.ylabel('Normal(1, 0.1)')
plt.ylim(0, fq.max()+1);
n = 10
h = 6
q = np.linspace(0., 1., 1000)
d = posterior(n, h, q)
plt.plot(q, d, '-k')
plt.xlabel('q parameter')
plt.ylabel('Posterior distribution')
plt.ylim(0, d.max()+1);
If we run the experiment on more data, the posterior distribution on q becomes more "peaky":
n = 10000
h = 6120
q = np.linspace(0., 1., 1000)
d = posterior(n, h, q)
plt.plot(q, d, '-k')
plt.xlabel('q parameter')
plt.ylabel('Posterior distribution')
plt.ylim(0, d.max()+1);
Given the posterior distribution over the model parameter $q$, we want to predict the outcome of an experiment. What are the odds of obtaining heads given what we know about $q$?
The answer is given by the predictive posterior distribution, which is the weighted average of all possible models (that is all possible values of $q$) according to their probability in the posterior distribution (this is called marginalization):
$$p(x = 1 | D) = \int_{0}^{1} p(x | q) p(q | D) dq$$import scipy.integrate as integrate
from scipy.stats import bernoulli
pred_posterior = lambda x, n, h: integrate.quad(lambda q: posterior(n,h,q) * bernoulli(q).pmf(x), 0, 1)
pred_posterior(1, 10, 6)
pred_posterior(1, 10000, 6000)
pred_posterior(1, 10000, 6001)