import os
import time
import numpy as np
42) np.random.seed(
This problem and discussion herein was adapted from Computational Thinking course offered by MIT. Link to course page
Question from First Course in Probability (Sheldon Ross):
Given 20 people, what is the probability that, among the 12 months in the year, there are 4 months containing exactly 2 birthdays and 4 containing exactly 3 birthdays? Link to problem
Problem setup
Rather than using a formula to estimate the probability we can setup an experiment using for-loops to mimic the trials and then count the actual number of cases where the condition is met.
= np.arange(1, 13) # array of months
month_number = 20 #Number of people considered
number_of_people = np.random.choice(month_number, size=number_of_people, replace=True, ) sample
Here sample
is the randomized set of 20 birthdays. Each entry is a month. Length of sample is same as number of people in the experiment, 20 in this case.
Highlight cases which satisfy the constraints: 1. 4 months have exactly 2 birthdays 2. 4 months have exactly 3 birthdays
For that first create an array with frequency of birthdays in each month:
= np.zeros(len(month_number)) #This is zeros frequency array
frequency for _entry in sample:
= _entry - 1
_index = frequency[_index] + 1 frequency[_index]
sample
array([ 7, 4, 11, 8, 5, 7, 10, 3, 7, 11, 11, 8, 5, 4, 8, 8, 3,
6, 5, 2])
frequency
array([0., 1., 2., 2., 3., 1., 3., 4., 0., 1., 3., 0.])
frequency
array shows how many birthday are there in a month (Jan - Dec).
Once we have that, now we have to consider the conditions given in the problem. We can either do this using for/if
statement or using numpy.argwhere
, where the number of months satisfying the conditions is estimated directly.
len(np.argwhere(frequency == 2))
2
Now putting it all together and doing multiple trials.
In this case, I do 1,500,000 trials and count the ratio of successes to the total trials. While this process takes time, it is better than remembering permutation and combination formulae ;P
This is a variant of monte-carlo simulation.
%%time
# putting it all togther
= 1_500_000
_number_of_trials = 0
_success
for _ in range(_number_of_trials):
= np.random.choice(month_number, size=number_of_people, replace=True)
sample
= np.zeros( len(month_number) )
frequency for _entry in sample:
if _entry in month_number:
= _entry - 1
_index = frequency[_index] + 1
frequency[_index]
if ( len(np.argwhere(frequency == 2)) == 4 ) and ( len(np.argwhere(frequency == 3)) == 4 ):
= _success + 1
_success
print('Probability of success = {}'.format(_success / _number_of_trials))
Probability of success = 0.00106
CPU times: user 3min 8s, sys: 2.08 s, total: 3min 10s
Wall time: 3min 10s
This answer is very close you’d get from using probablity counting and formula!