Coin flip power analysis

7 minute read

Deciding if a coin is fair

We can have a statistical model for single flip of the coin: $({0, 1}, {\sim\text{Ber}(p)}_{p \in (0, 1)})$ (means the result is either 0 (tails) or 1 (heads) and the distribution is just a Bernoulli with prob $p$ of $1$ and $(1-p)$ of $0$)

Then the null and alternative hypotheses are

$H_0$: coin is fair with probability 50% of heads and 50% of tails. $p=0.5$, i.e. $\Theta_0 = {0.5}$
$H_1$: coin is not fair with $p\neq 0.5$, i.e. $\Theta_1 = (0, 1)\setminus{0.5}$ (all probs on the closed interval excluding 0.5)

Test statistic

We use the test statistic

\[T_n = \sqrt{n}\frac{|\bar{X}_n - 1/2|}{\sqrt{1/2(1-1/2)}}=2\sqrt{n}|\bar{X}_n-1/2|\]

Why do we choose this as our test statistics? Well, if $X_j \sim \text{Ber}(p)$, then

\[\begin{aligned} \text{var}(\bar{X}_n)&=\text{var}\left(\frac{X_1 + \dots + X_n}{n}\right)\\ &=\frac{1}{n^2}\text{var}(X_1+\dots+X_n)\\ &=\frac{n}{n^2}\text{var}(X_1)\\ &=\frac{1}{n}p(1-p) \end{aligned}\]

where the first step follows from $\text{var}(aX)=a^2\text{var}{X}$, the second step follows from the fact that the $X_j \stackrel{iid}{\sim} \text{Ber}(p)$, i.e. they are all assumed independent, identically distributed random variables with the same mean and variance, plus linearity of variance for independent random variables, $\text{var}(X+Y)=\text{var}(X)+\text{var}(Y)$, if $X, Y$ are independent.Finally we use the fact that the variance of a Bernoulli is $p(1-p)$ as can be easily demonstrated.

Variance of a Bernoulli

\[\text{var}[X] = \text{E}(X^2) - (\text{E}(X))^2\]

The expectation is just

\[\text{E}(X) = p\times 1 + (1-p)\times 0 = p\]

The second moment is

\[\text{E}(X^2) = p\times 1^2 + (1-p)\times 0^2 = p\]

Therefore

\[\text{var}[X] = p-p^2=p(1-p)\]

CLT

The CLT tells us that the sum will be normally distributed as $n \to \infty$. Iff we want a standard normal $\sim \mathcal{N}(0, 1)$ then we need to subtract the mean ($p$) and the divide by the standard deviation. It’s easy to verify that the variance of this demeaned and rescaled random variable will be 1 (See Introduction to Probability- Dimitri P. Bertsekas (Author), John N. Tsitsiklis, ch5, p274 for example)

If we assume the coin is fair, $p=0.5$, then as $n\to \infty$ we’d expect

\[2\sqrt{n}(\bar{X}_n - 1/2) \xrightarrow[n \to \infty]{(d)} \mathcal{N}(0, 1)\]

Test

We design a test

\[\psi_{\alpha} = \mathbb{1}(T_n > c_{\alpha})\]

i.e. $\psi_{\alpha}=1$ when we reject the null-hypothesis, and $\psi_{\alpha}=0$ when we fail to reject it (never “accept” it as such, only either reject or fail to reject).

Errors

Type 1 error

This happens when we reject the null and we shouldn’t have, in other words if we have $\theta \in \Theta_0$ (in our case $p=1/2$) and $\psi_{\alpha}=1$, or the coin is fair, but we reject it being so. The “level” of the test $\alpha$ is the largest probability of rejecting the null hypothesis over $\Theta_0$:

\[\lim_{n\to\infty} P_{1/2}(\psi=1) \le \alpha\]

If $p=1/2$, then by the CLT then the probability of a type 1 error goes as if we choose $C=q_{\alpha/2}$

\[P(\psi=1) = P\left(2\sqrt{n}|\bar{X}_n-1/2| > q_{\alpha/2}\right) \to \alpha\]

where $q_{\alpha/2}$ is the $1-\alpha/2$ quantile. We can therefore limit our type 1 error and get a level $\alpha$ if we choose $C$ like this.

Type 2 error

\[\begin{aligned} &\beta_{\psi_n}: \Theta_1 \to \mathbb{R}\\ &\theta \mapsto P_{\theta}(\psi_n=0) \end{aligned}\]

In words, this is saying that if the alternative hypothesis is true ($\theta \in \Theta_1$ or in our case $p\ne 1/2$, coin not fair), what is the probability that we fail to reject the null hypothesis, and therefore commit a type 2 error. E.g the coin has bias 0.6 but we fail to reject the null hypothesis that the coin is fair.

Well if we have a fair coin we reject if

\[\begin{aligned} &2\sqrt{n}|\bar{X}_n-1/2| > q_{\alpha/2}\\ &=|\bar{X}_n-1/2| > \frac{q_{\alpha/2}}{2\sqrt{n}}\\ \end{aligned}\]

which means our rejection region is

\[\bar{X}_n > \frac{1}{2}+\frac{q_{\alpha/2}}{2\sqrt{n}}\]

\[\bar{X}_n < \frac{1}{2}-\frac{q_{\alpha/2}}{2\sqrt{n}}\]

Python

from scipy.stats import norm
import numpy as np

# Get the q alpha/2
alpha = 0.05 # level
p = 0.5 # fair coin
n = 100 # how many flips
q = norm.ppf(1-alpha/2)
print(f'The value of q is {q:.2f} when alpha is {alpha}')


lower_r = 0.5-q/(2*np.sqrt(n))
upper_r = 0.5+q/(2*np.sqrt(n))

print(f'The rejection region is less than {lower_r:.2f} or above {upper_r:.2f}')
print(f'{(1-alpha)*100} % of the time we get a count of heads out of {n} flips between {n*lower_r:.2f} and {n*upper_r:.2f} if n={n}')

Running this script tells me that if $n=100$ and $\alpha=0.05$, with a fair coin we’d reject the null hypothesis (erroneously, type 1 error) if we had less than 40 heads or more than 60 heads. Our rejection rejection, $R_{\psi_n}$ is therefore less than 40 or above 60.

What is the probability that we reject the null hypothesis when $p\ne 0.5$?

This is known as the “statistical power” of the test

The power of the test is formally defined as

\[\pi_{\psi_n} = \inf_{\theta \in \Theta_1} (1-\beta_{\theta_{\psi_n}}(\theta))\]

Over the space of parameters defining the alternative hypothesis (where we should be rejecting the null), what is our “smallest” chance of rejecting the null….our smallest chance of scientific discovery etc…

For the coin I want to plot the probability of rejecting the null over $p\ne 0.5$

We reject on the right if

\[\begin{aligned} &P\left(\bar{X_n} > \frac{1}{2} + \frac{q_{\alpha/2}}{2\sqrt{n}}\right)\\ &=P\left(\bar{X_n} - p > \frac{1}{2} - p + \frac{q_{\alpha/2}}{2\sqrt{n}}\right)\\ &=P\Bigg(2\sqrt{n}(\bar{X_n} - p) > 2\sqrt{n}\left(1/2 - p\right) + q_{\alpha/2}\Bigg)\\ \end{aligned}\]

where here we are now considering the true param being $p\ne1/2$.

$2\sqrt{n}(\bar{X_n} - p) \sim \mathcal{N}(0, 1)$ if the true param is $p$. So the probability goes as

\[1-\Phi\Big(q_{\alpha/2} + 2\sqrt{n}\left(1/2 - p\right) \Big)\]

Notice if $p>1/2$ then this would decrease the $q$ decreasing the CDF and increasing the probability of rejection…But if $p=1/2$ there’d be no change.

Similarly, reject on the left if

\[\begin{aligned} &P\left(\bar{X_n} < \frac{1}{2} - \frac{q_{\alpha/2}}{2\sqrt{n}}\right)\\ &=P\left(\bar{X_n} - p < \frac{1}{2} - p - \frac{q_{\alpha/2}}{2\sqrt{n}}\right)\\ &=P\Bigg(2\sqrt{n}(\bar{X_n} - p) < 2\sqrt{n}\left(1/2 - p\right) - q_{\alpha/2}\Bigg)\\ \end{aligned}\]

where here we are now considering the true param being $p\ne1/2$.

$2\sqrt{n}(\bar{X_n} - p) \sim \mathcal{N}(0, 1)$ if the true param is $p$. So the probability goes as

\[\Phi\Big(-q_{\alpha/2} + 2\sqrt{n}\left(1/2 - p\right) \Big)\]

The total probability of rejection is therefore

\[1-\Phi\Big(q_{\alpha/2} + 2\sqrt{n}\left(1/2 - p\right) \Big) +\Phi\Big(-q_{\alpha/2} + 2\sqrt{n}\left(1/2 - p\right) \Big)\]

which is just one minus the probability of failing to reject.

def prob_reject(n, p, alpha=0.05):
    """Calculates the probability of rejecting the null hypothesis
    Arguments:
    n -- number of flips
    p -- actual probability for heads
    """
    # confidence interval for p = 0.5 for n flips
    # lower, upper = norm.interval(1-alpha)
    q = norm.ppf(1-alpha/2)
    adjustment= 2 * np.sqrt(n) * (0.5 - p)

    return (1 - norm.cdf(q+adjustment)) + norm.cdf(-q + adjustment)

Let’s plot this

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 1)
ax.set_title("Probability of rejection vs coin bias for n = 100")

def plot_power(n, ax, **kwargs):
    """Plots power vs actual heads
    Arguments:
    n -- number of flips
    ax -- matplotlib axes
    kwargs -- keyword arguments for plotting
    """
    p_values = np.linspace(0,1,1000)
    ax.plot(p_values, prob_reject(n, p_values), **kwargs)
    ax.set_xticks(np.arange(0,1.1, 0.1))
    ax.set_yticks(np.arange(0,1.1, 0.1))
    ax.set_ybound((0, 1.05))
    ax.set_ylabel("Prob of rejection")
    ax.set_xlabel("Actual Heads Probability")
    
fig, ax = plt.subplots(1, 1)

def plot_powers(ax):
    ax.set_title("Prob of rejection vs coin bias")
    plot_power(100, ax, color='r', label="flips = 100")
    plot_power(250, ax, color='b', label="flips = 250")
    plot_power(1000, ax, color='g', label="flips = 1000")
    plot_power(3000, ax, color='y', label="flips = 3000")
    ax.legend(bbox_to_anchor=(0.75, 0.6), loc=2, prop={'size':16})
    
plot_powers(ax)

Probability of rejection vs coin bias for various n

Let’s zoom in

fig, ax = plt.subplots(1, 1)
plot_powers(ax)
ax.annotate('Power > 0.9', xy=(0.4758, 0.95), size=16)
ax.fill_between([0, 1.0], [0.9, 0.9], [1.0, 1.0], alpha=0.2)
ax.set_ybound((0.85, 1.01))
ax.set_xbound((0.3, 0.7))
ax.set_xticks(np.arange(0.30,0.70,0.04));

Zoomed Probability of rejection vs coin bias for various n

If we only flipped 100 times, then in order for our test to have 90% power the bias of the coins would need to be $p < 0.34$ or $ p>0.66$

If we however flipped 1000 times, then for the same power if the bias is $p<0.45$ or $p > 55$

With 100 flips, we can only distinguish a very bias coin where p<0.34p<0.34 or p>0.66p>0.66 from a fair coin 90% of the time. With 10 times more flips (1000), we can distinguish a less bias coin where p<0.45p<0.45 or p>0.55p>0.55 from a fair coin 90% of the time.

We need a big sample size if we are going to detect small deviations from a fair coin with reasonable power.

Exact analysis given flips follow a binomial distribution

from scipy.stats import binom

def binom_coin_reject(n, p, alpha=0.05):
    """We can also do this binomial instead of CLT and standard normals
    """
    # confidence interval for p = 0.5 for n flips
    lower, upper = binom.interval(1 - alpha, n, 0.5)
    return 1 - (binom.cdf(upper, n, p) - binom.cdf(lower - 1, n, p))


fig, ax = plt.subplots(1, 1)
ax.set_title("Binomial prob of reject vs coin bias for n = 100")

def plot_binomial_power(n, ax, **kwargs):
    p_values = np.linspace(0,1,1000)
    ax.plot(p_values, binom_coin_reject(n, p_values), **kwargs)
    ax.set_xticks(np.arange(0,1.1, 0.1))
    ax.set_yticks(np.arange(0,1.1, 0.1))
    ax.set_ybound((0, 1.05))
    ax.set_ylabel("Power")
    ax.set_xlabel("Actual Heads Probability")


fig, ax = plt.subplots(1, 1)

def plot_comparison(ax):
    ax.set_title("Prob of rejection vs coin bias. Binomial vs CLT")
    plot_power(100, ax, color='r', label="flips = 100")
    plot_binomial_power(100, ax, color='b', label="flips = 100 (binom)")
    ax.legend(bbox_to_anchor=(0.75, 0.6), loc=2, prop={'size':16})
    
plot_comparison(ax)

Comparing binomial with CLT

Zooming in

fig, ax = plt.subplots(1, 1)
plot_comparison(ax)
ax.set_ybound((0.01, 0.3))
ax.set_xbound((0.3, 0.7))
ax.set_xticks(np.arange(0.30,0.70,0.04));

Comparing binomial with CLT

If we did the same thing but with $n=1000$

Comparing binomial with CLT

Whatsapp Twitter Facebook LinkedIn

Lee H

Coin flip power analysis

Deciding if a coin is fair

Test statistic

Variance of a Bernoulli

CLT

Test

Errors

Type 1 error

Type 2 error

Python

What is the probability that we reject the null hypothesis when $p\ne 0.5$?

Exact analysis given flips follow a binomial distribution

Comments

You May Also Enjoy

Logistic Regression from scratch

A/B Tests and Experiment Size

NLP Pipelines with NLTK

Useful Pandas Snippets