Solving the birthday problem using simple counting

Counting trials instead of using a PnC formula
Published

January 10, 2021

This problem and discussion herein was adapted from Computational Thinking course offered by MIT. Link to course page

Question from First Course in Probability (Sheldon Ross):

Given 20 people, what is the probability that, among the 12 months in the year, there are 4 months containing exactly 2 birthdays and 4 containing exactly 3 birthdays? Link to problem

import os 
import time
import numpy as np 

np.random.seed(42)

Problem setup

Rather than using a formula to estimate the probability we can setup an experiment using for-loops to mimic the trials and then count the actual number of cases where the condition is met.

month_number = np.arange(1, 13) # array of months 
number_of_people = 20 #Number of people considered
sample = np.random.choice(month_number, size=number_of_people, replace=True, )

Here sample is the randomized set of 20 birthdays. Each entry is a month. Length of sample is same as number of people in the experiment, 20 in this case.

Highlight cases which satisfy the constraints: 1. 4 months have exactly 2 birthdays 2. 4 months have exactly 3 birthdays

For that first create an array with frequency of birthdays in each month:

frequency = np.zeros(len(month_number)) #This is zeros frequency array
for _entry in sample:
    _index = _entry - 1
    frequency[_index] = frequency[_index] + 1
sample
array([ 7,  4, 11,  8,  5,  7, 10,  3,  7, 11, 11,  8,  5,  4,  8,  8,  3,
        6,  5,  2])
frequency
array([0., 1., 2., 2., 3., 1., 3., 4., 0., 1., 3., 0.])

frequency array shows how many birthday are there in a month (Jan - Dec).

Once we have that, now we have to consider the conditions given in the problem. We can either do this using for/if statement or using numpy.argwhere, where the number of months satisfying the conditions is estimated directly.

len(np.argwhere(frequency == 2)) 
2

Now putting it all together and doing multiple trials.

In this case, I do 1,500,000 trials and count the ratio of successes to the total trials. While this process takes time, it is better than remembering permutation and combination formulae ;P

This is a variant of monte-carlo simulation.

%%time
# putting it all togther 
_number_of_trials = 1_500_000
_success = 0 
  
for _ in range(_number_of_trials):

    sample = np.random.choice(month_number, size=number_of_people, replace=True)
    
    frequency = np.zeros( len(month_number) )
    for _entry in sample:
        if _entry in month_number:
            _index = _entry - 1
            frequency[_index] = frequency[_index] + 1
    
    if ( len(np.argwhere(frequency == 2)) == 4 ) and ( len(np.argwhere(frequency == 3)) == 4 ): 
        _success = _success + 1

print('Probability of success = {}'.format(_success / _number_of_trials))
Probability of success = 0.00106
CPU times: user 3min 8s, sys: 2.08 s, total: 3min 10s
Wall time: 3min 10s

This answer is very close you’d get from using probablity counting and formula!