7 minute read

Deciding if a coin is fairPermalink

We can have a statistical model for single flip of the coin: (0,1,Ber(p)p(0,1)) (means the result is either 0 (tails) or 1 (heads) and the distribution is just a Bernoulli with prob p of 1 and (1p) of 0)

Then the null and alternative hypotheses are

  • H0: coin is fair with probability 50% of heads and 50% of tails. p=0.5, i.e. Θ0=0.5
  • H1: coin is not fair with p0.5, i.e. Θ1=(0,1)0.5 (all probs on the closed interval excluding 0.5)

Test statisticPermalink

We use the test statistic

Tn=n|ˉXn1/2|1/2(11/2)=2n|ˉXn1/2|

Why do we choose this as our test statistics? Well, if XjBer(p), then

var(ˉXn)=var(X1++Xnn)=1n2var(X1++Xn)=nn2var(X1)=1np(1p)

where the first step follows from var(aX)=a2varX, the second step follows from the fact that the XjiidBer(p), i.e. they are all assumed independent, identically distributed random variables with the same mean and variance, plus linearity of variance for independent random variables, var(X+Y)=var(X)+var(Y), if X,Y are independent.Finally we use the fact that the variance of a Bernoulli is p(1p) as can be easily demonstrated.

Variance of a BernoulliPermalink

var[X]=E(X2)(E(X))2

The expectation is just

E(X)=p×1+(1p)×0=p

The second moment is

E(X2)=p×12+(1p)×02=p

Therefore

var[X]=pp2=p(1p)

CLTPermalink

The CLT tells us that the sum will be normally distributed as n. Iff we want a standard normal N(0,1) then we need to subtract the mean (p) and the divide by the standard deviation. It’s easy to verify that the variance of this demeaned and rescaled random variable will be 1 (See Introduction to Probability- Dimitri P. Bertsekas (Author), John N. Tsitsiklis, ch5, p274 for example)

If we assume the coin is fair, p=0.5, then as n we’d expect

2n(ˉXn1/2)(d)nN(0,1)

TestPermalink

We design a test

ψα=1(Tn>cα)

i.e. ψα=1 when we reject the null-hypothesis, and ψα=0 when we fail to reject it (never “accept” it as such, only either reject or fail to reject).

ErrorsPermalink

Type 1 errorPermalink

This happens when we reject the null and we shouldn’t have, in other words if we have θΘ0 (in our case p=1/2) and ψα=1, or the coin is fair, but we reject it being so. The “level” of the test α is the largest probability of rejecting the null hypothesis over Θ0:

limnP1/2(ψ=1)α

If p=1/2, then by the CLT then the probability of a type 1 error goes as if we choose C=qα/2

P(ψ=1)=P(2n|ˉXn1/2|>qα/2)α

where qα/2 is the 1α/2 quantile. We can therefore limit our type 1 error and get a level α if we choose C like this.

Type 2 errorPermalink

βψn:Θ1RθPθ(ψn=0)

In words, this is saying that if the alternative hypothesis is true (θΘ1 or in our case p1/2, coin not fair), what is the probability that we fail to reject the null hypothesis, and therefore commit a type 2 error. E.g the coin has bias 0.6 but we fail to reject the null hypothesis that the coin is fair.

Well if we have a fair coin we reject if

2n|ˉXn1/2|>qα/2=|ˉXn1/2|>qα/22n

which means our rejection region is

ˉXn>12+qα/22n

or

ˉXn<12qα/22n

PythonPermalink

from scipy.stats import norm
import numpy as np

# Get the q alpha/2
alpha = 0.05 # level
p = 0.5 # fair coin
n = 100 # how many flips
q = norm.ppf(1-alpha/2)
print(f'The value of q is {q:.2f} when alpha is {alpha}')


lower_r = 0.5-q/(2*np.sqrt(n))
upper_r = 0.5+q/(2*np.sqrt(n))

print(f'The rejection region is less than {lower_r:.2f} or above {upper_r:.2f}')
print(f'{(1-alpha)*100} % of the time we get a count of heads out of {n} flips between {n*lower_r:.2f} and {n*upper_r:.2f} if n={n}')

Running this script tells me that if n=100 and α=0.05, with a fair coin we’d reject the null hypothesis (erroneously, type 1 error) if we had less than 40 heads or more than 60 heads. Our rejection rejection, Rψn is therefore less than 40 or above 60.

What is the probability that we reject the null hypothesis when p0.5?Permalink

This is known as the “statistical power” of the test

The power of the test is formally defined as

πψn=infθΘ1(1βθψn(θ))

Over the space of parameters defining the alternative hypothesis (where we should be rejecting the null), what is our “smallest” chance of rejecting the null….our smallest chance of scientific discovery etc…

For the coin I want to plot the probability of rejecting the null over p0.5

We reject on the right if

P(¯Xn>12+qα/22n)=P(¯Xnp>12p+qα/22n)=P(2n(¯Xnp)>2n(1/2p)+qα/2)

where here we are now considering the true param being p1/2.

2n(¯Xnp)N(0,1) if the true param is p. So the probability goes as

1Φ(qα/2+2n(1/2p))

Notice if p>1/2 then this would decrease the q decreasing the CDF and increasing the probability of rejection…But if p=1/2 there’d be no change.

Similarly, reject on the left if

P(¯Xn<12qα/22n)=P(¯Xnp<12pqα/22n)=P(2n(¯Xnp)<2n(1/2p)qα/2)

where here we are now considering the true param being p1/2.

2n(¯Xnp)N(0,1) if the true param is p. So the probability goes as

Φ(qα/2+2n(1/2p))

The total probability of rejection is therefore

1Φ(qα/2+2n(1/2p))+Φ(qα/2+2n(1/2p))

which is just one minus the probability of failing to reject.

def prob_reject(n, p, alpha=0.05):
    """Calculates the probability of rejecting the null hypothesis
    Arguments:
    n -- number of flips
    p -- actual probability for heads
    """
    # confidence interval for p = 0.5 for n flips
    # lower, upper = norm.interval(1-alpha)
    q = norm.ppf(1-alpha/2)
    adjustment= 2 * np.sqrt(n) * (0.5 - p)

    return (1 - norm.cdf(q+adjustment)) + norm.cdf(-q + adjustment)

Let’s plot this

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 1)
ax.set_title("Probability of rejection vs coin bias for n = 100")

def plot_power(n, ax, **kwargs):
    """Plots power vs actual heads
    Arguments:
    n -- number of flips
    ax -- matplotlib axes
    kwargs -- keyword arguments for plotting
    """
    p_values = np.linspace(0,1,1000)
    ax.plot(p_values, prob_reject(n, p_values), **kwargs)
    ax.set_xticks(np.arange(0,1.1, 0.1))
    ax.set_yticks(np.arange(0,1.1, 0.1))
    ax.set_ybound((0, 1.05))
    ax.set_ylabel("Prob of rejection")
    ax.set_xlabel("Actual Heads Probability")
    
fig, ax = plt.subplots(1, 1)

def plot_powers(ax):
    ax.set_title("Prob of rejection vs coin bias")
    plot_power(100, ax, color='r', label="flips = 100")
    plot_power(250, ax, color='b', label="flips = 250")
    plot_power(1000, ax, color='g', label="flips = 1000")
    plot_power(3000, ax, color='y', label="flips = 3000")
    ax.legend(bbox_to_anchor=(0.75, 0.6), loc=2, prop={'size':16})
    
plot_powers(ax)

Probability of rejection vs coin bias for various n

Let’s zoom in

fig, ax = plt.subplots(1, 1)
plot_powers(ax)
ax.annotate('Power > 0.9', xy=(0.4758, 0.95), size=16)
ax.fill_between([0, 1.0], [0.9, 0.9], [1.0, 1.0], alpha=0.2)
ax.set_ybound((0.85, 1.01))
ax.set_xbound((0.3, 0.7))
ax.set_xticks(np.arange(0.30,0.70,0.04));

Zoomed Probability of rejection vs coin bias for various n

If we only flipped 100 times, then in order for our test to have 90% power the bias of the coins would need to be p<0.34 or p>0.66

If we however flipped 1000 times, then for the same power if the bias is p<0.45 or p>55

With 100 flips, we can only distinguish a very bias coin where p<0.34p<0.34 or p>0.66p>0.66 from a fair coin 90% of the time. With 10 times more flips (1000), we can distinguish a less bias coin where p<0.45p<0.45 or p>0.55p>0.55 from a fair coin 90% of the time.

We need a big sample size if we are going to detect small deviations from a fair coin with reasonable power.

Exact analysis given flips follow a binomial distributionPermalink


from scipy.stats import binom

def binom_coin_reject(n, p, alpha=0.05):
    """We can also do this binomial instead of CLT and standard normals
    """
    # confidence interval for p = 0.5 for n flips
    lower, upper = binom.interval(1 - alpha, n, 0.5)
    return 1 - (binom.cdf(upper, n, p) - binom.cdf(lower - 1, n, p))


fig, ax = plt.subplots(1, 1)
ax.set_title("Binomial prob of reject vs coin bias for n = 100")

def plot_binomial_power(n, ax, **kwargs):
    p_values = np.linspace(0,1,1000)
    ax.plot(p_values, binom_coin_reject(n, p_values), **kwargs)
    ax.set_xticks(np.arange(0,1.1, 0.1))
    ax.set_yticks(np.arange(0,1.1, 0.1))
    ax.set_ybound((0, 1.05))
    ax.set_ylabel("Power")
    ax.set_xlabel("Actual Heads Probability")


fig, ax = plt.subplots(1, 1)

def plot_comparison(ax):
    ax.set_title("Prob of rejection vs coin bias. Binomial vs CLT")
    plot_power(100, ax, color='r', label="flips = 100")
    plot_binomial_power(100, ax, color='b', label="flips = 100 (binom)")
    ax.legend(bbox_to_anchor=(0.75, 0.6), loc=2, prop={'size':16})
    
plot_comparison(ax)

Comparing binomial with CLT

Zooming in

fig, ax = plt.subplots(1, 1)
plot_comparison(ax)
ax.set_ybound((0.01, 0.3))
ax.set_xbound((0.3, 0.7))
ax.set_xticks(np.arange(0.30,0.70,0.04));

Comparing binomial with CLT

If we did the same thing but with n=1000

Comparing binomial with CLT

Comments