Coin flip power analysis
Deciding if a coin is fairPermalink
We can have a statistical model for single flip of the coin: (0,1,∼Ber(p)p∈(0,1)) (means the result is either 0 (tails) or 1 (heads) and the distribution is just a Bernoulli with prob p of 1 and (1−p) of 0)
Then the null and alternative hypotheses are
- H0: coin is fair with probability 50% of heads and 50% of tails. p=0.5, i.e. Θ0=0.5
- H1: coin is not fair with p≠0.5, i.e. Θ1=(0,1)∖0.5 (all probs on the closed interval excluding 0.5)
Test statisticPermalink
We use the test statistic
Tn=√n|ˉXn−1/2|√1/2(1−1/2)=2√n|ˉXn−1/2|Why do we choose this as our test statistics? Well, if Xj∼Ber(p), then
var(ˉXn)=var(X1+⋯+Xnn)=1n2var(X1+⋯+Xn)=nn2var(X1)=1np(1−p)where the first step follows from var(aX)=a2varX, the second step follows from the fact that the Xjiid∼Ber(p), i.e. they are all assumed independent, identically distributed random variables with the same mean and variance, plus linearity of variance for independent random variables, var(X+Y)=var(X)+var(Y), if X,Y are independent.Finally we use the fact that the variance of a Bernoulli is p(1−p) as can be easily demonstrated.
Variance of a BernoulliPermalink
var[X]=E(X2)−(E(X))2The expectation is just
E(X)=p×1+(1−p)×0=pThe second moment is
E(X2)=p×12+(1−p)×02=pTherefore
var[X]=p−p2=p(1−p)CLTPermalink
The CLT tells us that the sum will be normally distributed as n→∞. Iff we want a standard normal ∼N(0,1) then we need to subtract the mean (p) and the divide by the standard deviation. It’s easy to verify that the variance of this demeaned and rescaled random variable will be 1 (See Introduction to Probability- Dimitri P. Bertsekas (Author), John N. Tsitsiklis, ch5, p274 for example)
If we assume the coin is fair, p=0.5, then as n→∞ we’d expect
2√n(ˉXn−1/2)(d)→n→∞N(0,1)TestPermalink
We design a test
ψα=1(Tn>cα)i.e. ψα=1 when we reject the null-hypothesis, and ψα=0 when we fail to reject it (never “accept” it as such, only either reject or fail to reject).
ErrorsPermalink
Type 1 errorPermalink
This happens when we reject the null and we shouldn’t have, in other words if we have θ∈Θ0 (in our case p=1/2) and ψα=1, or the coin is fair, but we reject it being so. The “level” of the test α is the largest probability of rejecting the null hypothesis over Θ0:
limn→∞P1/2(ψ=1)≤αIf p=1/2, then by the CLT then the probability of a type 1 error goes as if we choose C=qα/2
P(ψ=1)=P(2√n|ˉXn−1/2|>qα/2)→αwhere qα/2 is the 1−α/2 quantile. We can therefore limit our type 1 error and get a level α if we choose C like this.
Type 2 errorPermalink
βψn:Θ1→Rθ↦Pθ(ψn=0)In words, this is saying that if the alternative hypothesis is true (θ∈Θ1 or in our case p≠1/2, coin not fair), what is the probability that we fail to reject the null hypothesis, and therefore commit a type 2 error. E.g the coin has bias 0.6 but we fail to reject the null hypothesis that the coin is fair.
Well if we have a fair coin we reject if
2√n|ˉXn−1/2|>qα/2=|ˉXn−1/2|>qα/22√nwhich means our rejection region is
ˉXn>12+qα/22√nor
ˉXn<12−qα/22√nPythonPermalink
from scipy.stats import norm
import numpy as np
# Get the q alpha/2
alpha = 0.05 # level
p = 0.5 # fair coin
n = 100 # how many flips
q = norm.ppf(1-alpha/2)
print(f'The value of q is {q:.2f} when alpha is {alpha}')
lower_r = 0.5-q/(2*np.sqrt(n))
upper_r = 0.5+q/(2*np.sqrt(n))
print(f'The rejection region is less than {lower_r:.2f} or above {upper_r:.2f}')
print(f'{(1-alpha)*100} % of the time we get a count of heads out of {n} flips between {n*lower_r:.2f} and {n*upper_r:.2f} if n={n}')
Running this script tells me that if n=100 and α=0.05, with a fair coin we’d reject the null hypothesis (erroneously, type 1 error) if we had less than 40 heads or more than 60 heads. Our rejection rejection, Rψn is therefore less than 40 or above 60.
What is the probability that we reject the null hypothesis when p≠0.5?Permalink
This is known as the “statistical power” of the test
The power of the test is formally defined as
πψn=infθ∈Θ1(1−βθψn(θ))Over the space of parameters defining the alternative hypothesis (where we should be rejecting the null), what is our “smallest” chance of rejecting the null….our smallest chance of scientific discovery etc…
For the coin I want to plot the probability of rejecting the null over p≠0.5
We reject on the right if
P(¯Xn>12+qα/22√n)=P(¯Xn−p>12−p+qα/22√n)=P(2√n(¯Xn−p)>2√n(1/2−p)+qα/2)where here we are now considering the true param being p≠1/2.
2√n(¯Xn−p)∼N(0,1) if the true param is p. So the probability goes as
1−Φ(qα/2+2√n(1/2−p))Notice if p>1/2 then this would decrease the q decreasing the CDF and increasing the probability of rejection…But if p=1/2 there’d be no change.
Similarly, reject on the left if
P(¯Xn<12−qα/22√n)=P(¯Xn−p<12−p−qα/22√n)=P(2√n(¯Xn−p)<2√n(1/2−p)−qα/2)where here we are now considering the true param being p≠1/2.
2√n(¯Xn−p)∼N(0,1) if the true param is p. So the probability goes as
Φ(−qα/2+2√n(1/2−p))The total probability of rejection is therefore
1−Φ(qα/2+2√n(1/2−p))+Φ(−qα/2+2√n(1/2−p))which is just one minus the probability of failing to reject.
def prob_reject(n, p, alpha=0.05):
"""Calculates the probability of rejecting the null hypothesis
Arguments:
n -- number of flips
p -- actual probability for heads
"""
# confidence interval for p = 0.5 for n flips
# lower, upper = norm.interval(1-alpha)
q = norm.ppf(1-alpha/2)
adjustment= 2 * np.sqrt(n) * (0.5 - p)
return (1 - norm.cdf(q+adjustment)) + norm.cdf(-q + adjustment)
Let’s plot this
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
ax.set_title("Probability of rejection vs coin bias for n = 100")
def plot_power(n, ax, **kwargs):
"""Plots power vs actual heads
Arguments:
n -- number of flips
ax -- matplotlib axes
kwargs -- keyword arguments for plotting
"""
p_values = np.linspace(0,1,1000)
ax.plot(p_values, prob_reject(n, p_values), **kwargs)
ax.set_xticks(np.arange(0,1.1, 0.1))
ax.set_yticks(np.arange(0,1.1, 0.1))
ax.set_ybound((0, 1.05))
ax.set_ylabel("Prob of rejection")
ax.set_xlabel("Actual Heads Probability")
fig, ax = plt.subplots(1, 1)
def plot_powers(ax):
ax.set_title("Prob of rejection vs coin bias")
plot_power(100, ax, color='r', label="flips = 100")
plot_power(250, ax, color='b', label="flips = 250")
plot_power(1000, ax, color='g', label="flips = 1000")
plot_power(3000, ax, color='y', label="flips = 3000")
ax.legend(bbox_to_anchor=(0.75, 0.6), loc=2, prop={'size':16})
plot_powers(ax)
Let’s zoom in
fig, ax = plt.subplots(1, 1)
plot_powers(ax)
ax.annotate('Power > 0.9', xy=(0.4758, 0.95), size=16)
ax.fill_between([0, 1.0], [0.9, 0.9], [1.0, 1.0], alpha=0.2)
ax.set_ybound((0.85, 1.01))
ax.set_xbound((0.3, 0.7))
ax.set_xticks(np.arange(0.30,0.70,0.04));
If we only flipped 100 times, then in order for our test to have 90% power the bias of the coins would need to be p<0.34 or p>0.66
If we however flipped 1000 times, then for the same power if the bias is p<0.45 or p>55
With 100 flips, we can only distinguish a very bias coin where p<0.34p<0.34 or p>0.66p>0.66 from a fair coin 90% of the time. With 10 times more flips (1000), we can distinguish a less bias coin where p<0.45p<0.45 or p>0.55p>0.55 from a fair coin 90% of the time.
We need a big sample size if we are going to detect small deviations from a fair coin with reasonable power.
Exact analysis given flips follow a binomial distributionPermalink
from scipy.stats import binom
def binom_coin_reject(n, p, alpha=0.05):
"""We can also do this binomial instead of CLT and standard normals
"""
# confidence interval for p = 0.5 for n flips
lower, upper = binom.interval(1 - alpha, n, 0.5)
return 1 - (binom.cdf(upper, n, p) - binom.cdf(lower - 1, n, p))
fig, ax = plt.subplots(1, 1)
ax.set_title("Binomial prob of reject vs coin bias for n = 100")
def plot_binomial_power(n, ax, **kwargs):
p_values = np.linspace(0,1,1000)
ax.plot(p_values, binom_coin_reject(n, p_values), **kwargs)
ax.set_xticks(np.arange(0,1.1, 0.1))
ax.set_yticks(np.arange(0,1.1, 0.1))
ax.set_ybound((0, 1.05))
ax.set_ylabel("Power")
ax.set_xlabel("Actual Heads Probability")
fig, ax = plt.subplots(1, 1)
def plot_comparison(ax):
ax.set_title("Prob of rejection vs coin bias. Binomial vs CLT")
plot_power(100, ax, color='r', label="flips = 100")
plot_binomial_power(100, ax, color='b', label="flips = 100 (binom)")
ax.legend(bbox_to_anchor=(0.75, 0.6), loc=2, prop={'size':16})
plot_comparison(ax)
Zooming in
fig, ax = plt.subplots(1, 1)
plot_comparison(ax)
ax.set_ybound((0.01, 0.3))
ax.set_xbound((0.3, 0.7))
ax.set_xticks(np.arange(0.30,0.70,0.04));
If we did the same thing but with n=1000
Comments