Model building checklist – Omnis tempus datum

Things to consider when building a ML model

Split in train and test – look at target prop statistics
Split train into CV or train / validation
Train models on the training data: - Linear model - Non linear models - Ensemble models - Decision models
Model hyperparameters

Train on full training dataset
Avenues of data bleed
Split quality - is the train/validation data representative of test data / real-life data?

What is the source of the data (database, publication, direct experiment)?
How many data points are in the training, validation and test sets?
How were the sets split? Is any bias being introduced based on the type of split?
Are the data, including the data splits used, released in a public forum?
How were the data encoded and preprocessed for the ML algorithm?
How many parameters (p) are used in the model?
How many features (f) are used as input?
Is p much larger than the number of training points and/or is f large?
Which overfitting prevention techniques used?
Are the hyperparameter configurations, optimization schedule, model files and optimization parameters reported?
Is the model black box or interpretable?
Is the model classification or regression?
How much time does a single representative prediction require on a standard machine?
Is the source code released?
How was the method evaluated?
Which performance metrics are reported?
Was a comparison to publicly available methods performed on benchmark datasets?
Do the performance metrics have confidence intervals?
Are the raw evaluation files available?