ML Foundations
55+ essential concepts in machine learning for algorithmic trading
What You'll Learn
Master machine learning fundamentals with 55+ comprehensive flashcards. Learn supervised vs unsupervised learning, bias-variance tradeoff, cross-validation, feature engineering, overfitting prevention, and ML concepts for algorithmic trading.
Key Topics
- Supervised vs unsupervised learning with trading examples
- Bias-variance tradeoff and how to diagnose overfitting/underfitting
- Walk-forward cross-validation for time series (avoiding lookahead bias)
- Feature scaling, selection, and engineering for trading models
- Regression vs classification and when to use each
- Tom Mitchell's T-P-E framework for defining ML problems
Looking for more machine learning resources? Visit the Explore page to browse related decks or use the Create Your Own Deck flow to customize this set.
How to study this deck
Start with a quick skim of the questions, then launch study mode to flip cards until you can answer each prompt without hesitation. Revisit tricky cards using shuffle or reverse order, and schedule a follow-up review within 48 hours to reinforce retention.
Preview: ML Foundations
Question
What is Tom Mitchell's definition of machine learning?
Answer
A computer program learns from experience E with respect to task T and performance measure P, if its performance at T, measured by P, improves with experience E.
Question
In ML definition T-P-E, what does T stand for and what is it?
Answer
T = Task. The specific action the model performs (e.g., classify stocks as buy/sell/hold, predict tomorrow's price).
Question
In ML definition T-P-E, what does P stand for and what is it?
Answer
P = Performance measure. How you evaluate success (e.g., portfolio returns, Sharpe ratio, prediction accuracy, RMSE).
Question
In ML definition T-P-E, what does E stand for and what is it?
Answer
E = Experience. The training data the model learns from (e.g., historical prices, technical indicators, past trades).
Question
What is supervised learning?
Answer
Learning from labeled data where each training example has input features (X) and a known output label (y). The model learns the mapping X → y.
Question
What is unsupervised learning?
Answer
Learning from unlabeled data to find patterns, structure, or relationships without being told the right answer. Examples: clustering, PCA, anomaly detection.
Question
Is clustering stocks by price behavior supervised or unsupervised?
Answer
Unsupervised - you're discovering groups without predefined labels of which stocks belong together.
Question
Is predicting stock returns using labeled historical data supervised or unsupervised?
Answer
Supervised - you have labels (actual returns) for each training example and are learning to predict future returns.
Question
What is regression?
Answer
Supervised learning where the output is a continuous numerical value (e.g., predicting stock price as $178.42, or return as 2.3%).
Question
What is classification?
Answer
Supervised learning where the output is a discrete category or class (e.g., buy/sell/hold, up/down, high risk/low risk).
Question
Is predicting exact stock price regression or classification?
Answer
Regression - the output is a continuous value (price can be any number).
Question
Is predicting whether a stock will go up or down regression or classification?
Answer
Classification - the output is a discrete category (up or down).
Question
What does HIGH BIAS mean?
Answer
Model is too simple and cannot capture the underlying pattern. Results in underfitting with high training error AND high test error. The model is biased toward overly simple assumptions.
Question
What does HIGH VARIANCE mean?
Answer
Model is too complex and learns noise as if it were signal. Results in overfitting with low training error BUT high test error. The model varies wildly based on training data.
Question
Training error = 25%, Test error = 26%. What's the problem?
Answer
HIGH BIAS (underfitting). Both errors are high and similar. The model is too simple to learn the pattern.
Question
Training error = 2%, Test error = 18%. What's the problem?
Answer
HIGH VARIANCE (overfitting). Large gap between training and test error. The model memorized training data but doesn't generalize.
Question
Training error = 4%, Test error = 5%. What's the status?
Answer
GOOD GENERALIZATION (sweet spot). Both errors are low with small gap. Model has appropriate complexity - low bias and low variance.
Question
What is the bias-variance tradeoff?
Answer
As model complexity increases, bias decreases but variance increases. The goal is to find the balance that minimizes test error - not too simple (high bias) and not too complex (high variance).
Question
How do you reduce HIGH BIAS (underfitting)?
Answer
Add complexity: use more features, increase model complexity (deeper trees, higher polynomial degree), reduce regularization, train longer.
Question
How do you reduce HIGH VARIANCE (overfitting)?
Answer
Reduce complexity: use fewer features, simplify model (prune trees, lower polynomial degree), add regularization, get more training data, use ensembles.
Question
What is training error?
Answer
Error on the data the model learned from. Always decreases (or stays same) as model complexity increases. Can be misleadingly low with overfitting.
Question
What is test error?
Answer
Error on new, unseen data the model hasn't trained on. This is what we actually care about for real-world performance. U-shaped curve as complexity increases.
Question
Why is standard k-fold cross-validation dangerous for time series?
Answer
It causes lookahead bias - you can train on future data to predict the past because folds are randomly split. This violates temporal order and gives unrealistic performance estimates.
Question
What is lookahead bias?
Answer
When a model uses information from the future (that wouldn't be available in real trading) during training. This inflates backtest performance but causes failure in live trading.
Question
What is walk-forward (forward chaining) cross-validation?
Answer
Time series cross-validation where you always train on past data and test on future data. Training set grows or rolls forward, respecting temporal order and avoiding lookahead bias.
Question
What is the main advantage of walk-forward CV over single train/test split?
Answer
Multiple test evaluations across different time periods give more robust performance estimates. Reduces risk of overfitting to one specific test period and shows performance across different market conditions.
Question
Why is a single train/test split risky for time series?
Answer
You get only ONE test score from ONE specific time period. Model might perform well by chance on that period but fail in other market conditions. Risk of overfitting to that particular test period.
Question
What is standardization (z-score normalization)?
Answer
Feature scaling using (X - mean) / std. Results in mean=0, std=1. Preserves distribution shape, handles outliers better. Most common in ML4T.
Question
What is min-max scaling?
Answer
Feature scaling using (X - min) / (max - min). Results in range [0,1]. Sensitive to outliers. Less common in ML4T than standardization.
Question
Why do we scale features?
Answer
To put features on similar scales so no single feature dominates due to its magnitude. Critical for distance-based algorithms (KNN) and gradient descent (neural networks, linear regression).
Question
Which algorithms NEED feature scaling?
Answer
Scale-sensitive algorithms: KNN, linear/logistic regression with gradient descent, neural networks, SVM, PCA. These use distances or gradients affected by feature magnitude.
Question
Which algorithms DON'T need feature scaling?
Answer
Scale-invariant algorithms: Decision trees, Random Forests, tree-based models in general, Naive Bayes. These use thresholds or probabilities, not distances.
Question
When scaling features, should you use statistics from the entire dataset?
Answer
NO! Calculate mean/std using ONLY training data, then apply those same parameters to test data. Using test data statistics causes data leakage.
Question
What is feature selection?
Answer
Choosing which existing features to use. Reduces overfitting, speeds training, improves interpretability. Methods: correlation analysis, feature importance, forward/backward selection, L1 regularization.
Question
What is feature engineering?
Answer
Creating new features from existing ones. In ML4T: technical indicators (momentum, Bollinger Bands, moving averages), derived metrics (volatility, returns), crossovers, etc.
Question
Is PCA supervised or unsupervised learning?
Answer
Unsupervised - it finds patterns in the features themselves without using any labels. Even if you later use PCA components in supervised learning, PCA itself is unsupervised.
Question
Is calculating Sharpe ratio machine learning?
Answer
No - it's just statistical computation using a formula. Not ML because there's no learning from data to make predictions or discover patterns.
Question
Training on 2023 data, testing on 2024 data - does this avoid lookahead bias?
Answer
Yes, it respects temporal order. But it's still risky because you get only ONE test period. Walk-forward CV with multiple test periods is more robust.
Question
Model always predicts 0% return. High or low bias?
Answer
HIGH BIAS - the model is too simple (ignores all features). This leads to underfitting with high training and test errors.
Question
Model with 500 features fits training data perfectly but fails on test data. High or low variance?
Answer
HIGH VARIANCE - the model is too complex and overfits. It memorizes training data (including noise) but doesn't generalize to new data.
Question
Why does KNN need feature scaling but Random Forest doesn't?
Answer
KNN uses distance calculations where large-scale features dominate. Random Forest uses threshold-based splits where only the ordering matters, not the magnitude.
Question
Expanding window vs rolling window in walk-forward CV?
Answer
Expanding: training data keeps growing (uses all past). Rolling: fixed-size training window slides forward (uses recent data only). Choose based on whether more data is better or recent data is more relevant.