This problem and discussion herein was adapted from Computational Thinking course offered by MIT. Link to course page

Question from First Course in Probability (Sheldon Ross):

Given 20 people, what is the probability that, among the 12 months in the year, there are 4 months containing exactly 2 birthdays and 4 containing exactly 3 birthdays? Link to problem

import os
import time
import numpy as np

np.random.seed(42)


### Problem setup

Rather than using a formula to estimate the probability we can setup an experiment using for-loops to mimic the trials and then count the actual number of cases where the condition is met.

month_number = np.arange(1, 13) # array of months
number_of_people = 20 #Number of people considered
sample = np.random.choice(month_number, size=number_of_people, replace=True, )


Here sample is the randomized set of 20 birthdays. Each entry is a month. Length of sample is same as number of people in the experiment, 20 in this case.

Highlight cases which satisfy the constraints:

1. 4 months have exactly 2 birthdays
2. 4 months have exactly 3 birthdays

For that first create an array with frequency of birthdays in each month:

frequency = np.zeros(len(month_number)) #This is zeros frequency array
for _entry in sample:
_index = _entry - 1
frequency[_index] = frequency[_index] + 1

sample

array([ 7,  4, 11,  8,  5,  7, 10,  3,  7, 11, 11,  8,  5,  4,  8,  8,  3,
6,  5,  2])
frequency

array([0., 1., 2., 2., 3., 1., 3., 4., 0., 1., 3., 0.])

frequency array shows how many birthday are there in a month (Jan - Dec).

Once we have that, now we have to consider the conditions given in the problem. We can either do this using for/if statement or using numpy.argwhere, where the number of months satisfying the conditions is estimated directly.

len(np.argwhere(frequency == 2))

2

Now putting it all together and doing multiple trials.

In this case, I do 1,500,000 trials and count the ratio of successes to the total trials. While this process takes time, it is better than remembering permutation and combination formulae ;P

This is a variant of monte-carlo simulation.

%%time
# putting it all togther
_number_of_trials = 1_500_000
_success = 0

for _ in range(_number_of_trials):

sample = np.random.choice(month_number, size=number_of_people, replace=True)

frequency = np.zeros( len(month_number) )
for _entry in sample:
if _entry in month_number:
_index = _entry - 1
frequency[_index] = frequency[_index] + 1

if ( len(np.argwhere(frequency == 2)) == 4 ) and ( len(np.argwhere(frequency == 3)) == 4 ):
_success = _success + 1

print('Probability of success = {}'.format(_success / _number_of_trials))

Probability of success = 0.00106
CPU times: user 3min 8s, sys: 2.08 s, total: 3min 10s
Wall time: 3min 10s


This answer is very close you'd get from using probablity counting and formula!