Introduction¶
This introductory page is a big long, but that's because all the below concepts are common to every upcoming topic.
Machine Learning¶
Field of study that enables computers to learn without being explicitly programmed; machine learns how to perform task \(T\) from experience \(E\) with performance measure \(P\).
Machine learning is necessary when it is not possible for us to make rules, ie, easier for the machine to learn the rules on its own
flowchart LR
subgraph Machine Learning
direction LR
i2[Past<br/>Input] & o2[Past<br/>Output] -->
a(( )) -->
r2[Derived<br/>Rules/<br/>Functions]
r2 & ni[New<br/>Input] -->
c(( )) -->
no[New<br/>Output]
end
subgraph Traditional Programming
direction LR
r1[Standard<br/>Rules/<br/>Functions] & i1[New<br/>Input] -->
b(( )) -->
o1[New<br/>Output]
end
Why do we need ML?¶
To perform tasks which are easy for humans, but difficult to generate a computer program for it
Requirements¶
- \(\exists\) pattern
- If \(\not \exists\) pattern and its just noise, it is impossible to model it
- We cannot quantify pattern mathematically
- \(\exists\) data
Guiding Principles¶
Principle | Questions |
---|---|
Relevance | Is the use of ML in a given context solving an appropriate problem |
Representativeness | Is the training data appropriately selected |
Value | - Do the predictions inform human decisions in a meaningful way - Does the machine learning model produce more accurate predictions than alternative methods - Does it explain variation more completely than alternative methods |
Explainability | - Data selection, Model selection, (un)intended consequences - How effectively is use of ML communicated |
Auditability | Can the model's decision process be queried/monitored by external actors |
Equity | The model should benefit/harm one group disproportionately |
Accountability/Responsibility | Are there mechanisms in place to ensure that someone will be responsible for responding to feedback and redressing harms, if necessary? |
Learning Problem¶
Given training examples and hypothesis set of candidate models, generate a hypothesis function using a learning algorithm to estimate an unknown target function
\(P(x)\) quantifies relative importance of \(x\)
Learning model
- Learning algorithm
- Hypothesis set
Stages of Machine Learning¶
Stage 1 - Data Collection
Stage 2 - Feature importance - Feature selection
Stage 3: Theory - Causal Discovery - Causal Theory building
Stage 4: Building - Feature engineering - Model specification
Stage 5: Tuning - Model class/Learning algorithm selection - Hyperparameter tuning
Stage 6 - Model comparison - Model selection
Stage 7: Evaluation - Performance - Robustness
Stage 8: Inference - Model prediction - Model explanation
Model Engineering¶
flowchart LR
subgraph Data Engineering
direction LR
dc[(Data<br/>Collection)] -->
|Raw<br/>Data| di[(Data<br/>Ingestion)] -->
|Indexed<br/>Data| da[(Data<br/>Analysis, Curation)] -->
|Selected<br/>Data| dl[(Data<br/>Labelling)] -->
|Labelled<br/>Data| dv[(Data<br/>Validation)] -->
|Validated<br/>Data| dp[(Data<br/>Preparation)]
end
td[Task<br/>Definition] -->
dc
dp -->
|ML-Ready<br/>Dataset| l
subgraph ML Engineering
direction LR
l[Learning<br/>Type] -->
c[Define<br/>Cost] -->
mo[Optimization] -->
|Trained<br/>Model| me[Evaluate] -->
|KPIs| mv[Model<br/>Validation] -->
|Certified<br/>Model| md[/Deploy/]
od{Outlier<br/>Detection}
ad{Anomaly<br/>Detection}
end
dp --> od --> |Filter| ad
ld[(Live <br/>Data)] --> od
md --> m[Model]
ad --> |Accept| m
ad ----> |Reject| anomaly[/Anomaly/]
m --> pc{Prediction<br/>Confidence}
pc --> |High<br/>Confidence| pred[/Prediction/]
pc --> |Low<br/>Confidence| unsure[/Unsure/]
- Design
- Why am I building?
- Who am I building for?
- What am I building?
- What are the consequences if it fails?
- Development
- What data will be collected to train the model
- Does the dataset follow AI ethics?
- Deployment
- How will model drift be monitored?
- How should preventive security measures be taken?
- How to react to security breaches?
3 Dimensions of Prediction¶
- Point estimate
- Time
- Probabilistic
- Intervals
- Density
- Trajectories/Scenarios
Good Prediction Characteristics¶
- Forecast/Prediction consistency: Forecasts/Predictions should correspond to forecaster’s best judgement on future events, based on the knowledge available at the time of issuing the Forecasts/Predictions
- Forecast/Prediction quality (accuracy): Forecasts/Predictions should describe future events as good as possible, regardless of what these Forecasts/Predictions may be used for
- Forecast/Prediction value: Forecasts/Predictions should bring additional benefits (monetary/others) when used as input to decision-making
Hence, sometimes you may choose the Forecast/Prediction with the better value even if its quality is not the best
Performance vs Parsimony¶
- Parsimonious models are more explainable
- Parsimonious models generalize better
- Small gains with deep models may disappear with dataset shift/non-stationary
Aspects¶
Aspect | Equivalent in Marco Polo game |
---|---|
Loss | Goal |
Model Class | Map |
Optimization | Search |
Data | Sound |
Open-source Tools¶
Scikit-Learn | |
TensorFLow | |
Keras | |
PyTorch | |
MXNet | |
CNTK | |
Caffe | |
PaddlePaddle | |
Weka |
Doesn’t do well for Forecasting¶
Machine Learning cannot provide reliable time-series forecasting, without causal reasoning. This is why AI/ML cannot be blindly trusted for stock price prediction.
Related topics
- Model ends up being a Naive forecaster: just blindly predicts \(\hat y_{t+h} = y_t\)
- Counter-factual simulation: Never-before-seen events, such as
- declining house prices
- Negative oil prices
- Distribution drift
- Turkey problem
In the face of external factors that is not factored into the model, human intervention is required