This problem and discussion herein was adapted from Computational Thinking course offered by MIT. Link to course page
Question from First Course in Probability (Sheldon Ross):
Given 20 people, what is the probability that, among the 12 months in the year, there are 4 months containing exactly 2 birthdays and 4 containing exactly 3 birthdays? Link to problem
import os
import time
import numpy as np
np.random.seed(42)
month_number = np.arange(1, 13) # array of months
number_of_people = 20 #Number of people considered
sample = np.random.choice(month_number, size=number_of_people, replace=True, )
Here sample
is the randomized set of 20 birthdays. Each entry is a month. Length of sample is same as number of people in the experiment, 20 in this case.
Highlight cases which satisfy the constraints:
- 4 months have exactly 2 birthdays
- 4 months have exactly 3 birthdays
For that first create an array with frequency of birthdays in each month:
frequency = np.zeros(len(month_number)) #This is zeros frequency array
for _entry in sample:
_index = _entry - 1
frequency[_index] = frequency[_index] + 1
sample
frequency
frequency
array shows how many birthday are there in a month (Jan - Dec).
Once we have that, now we have to consider the conditions given in the problem. We can either do this using for/if
statement or using numpy.argwhere
, where the number of months satisfying the conditions is estimated directly.
len(np.argwhere(frequency == 2))
Now putting it all together and doing multiple trials.
In this case, I do 1,500,000 trials and count the ratio of successes to the total trials. While this process takes time, it is better than remembering permutation and combination formulae ;P
This is a variant of monte-carlo simulation.
%%time
# putting it all togther
_number_of_trials = 1_500_000
_success = 0
for _ in range(_number_of_trials):
sample = np.random.choice(month_number, size=number_of_people, replace=True)
frequency = np.zeros( len(month_number) )
for _entry in sample:
if _entry in month_number:
_index = _entry - 1
frequency[_index] = frequency[_index] + 1
if ( len(np.argwhere(frequency == 2)) == 4 ) and ( len(np.argwhere(frequency == 3)) == 4 ):
_success = _success + 1
print('Probability of success = {}'.format(_success / _number_of_trials))
This answer is very close you'd get from using probablity counting and formula!