ML Foundations

55+ essential concepts in machine learning for algorithmic trading

What You'll Learn

Master machine learning fundamentals with 55+ comprehensive flashcards. Learn supervised vs unsupervised learning, bias-variance tradeoff, cross-validation, feature engineering, overfitting prevention, and ML concepts for algorithmic trading.

Key Topics

Supervised vs unsupervised learning with trading examples
Bias-variance tradeoff and how to diagnose overfitting/underfitting
Walk-forward cross-validation for time series (avoiding lookahead bias)
Feature scaling, selection, and engineering for trading models
Regression vs classification and when to use each
Tom Mitchell's T-P-E framework for defining ML problems

Looking for more machine learning resources? Visit the Explore page to browse related decks or use the Create Your Own Deck flow to customize this set.

How to study this deck

Start with a quick skim of the questions, then launch study mode to flip cards until you can answer each prompt without hesitation. Revisit tricky cards using shuffle or reverse order, and schedule a follow-up review within 48 hours to reinforce retention.

Preview: ML Foundations

Question

What is Tom Mitchell's definition of machine learning?

Answer

A computer program learns from experience E with respect to task T and performance measure P, if its performance at T, measured by P, improves with experience E.

Question

In ML definition T-P-E, what does T stand for and what is it?

Answer

T = Task. The specific action the model performs (e.g., classify stocks as buy/sell/hold, predict tomorrow's price).

Question

In ML definition T-P-E, what does P stand for and what is it?

Answer

P = Performance measure. How you evaluate success (e.g., portfolio returns, Sharpe ratio, prediction accuracy, RMSE).

Question

In ML definition T-P-E, what does E stand for and what is it?

Answer

E = Experience. The training data the model learns from (e.g., historical prices, technical indicators, past trades).

Question

What is supervised learning?

Answer

Learning from labeled data where each training example has input features (X) and a known output label (y). The model learns the mapping X → y.

Question

What is unsupervised learning?

Answer

Learning from unlabeled data to find patterns, structure, or relationships without being told the right answer. Examples: clustering, PCA, anomaly detection.

Question

Is clustering stocks by price behavior supervised or unsupervised?

Answer

Unsupervised - you're discovering groups without predefined labels of which stocks belong together.

Question

Is predicting stock returns using labeled historical data supervised or unsupervised?

Answer

Supervised - you have labels (actual returns) for each training example and are learning to predict future returns.

Question

What is regression?

Answer

Supervised learning where the output is a continuous numerical value (e.g., predicting stock price as $178.42, or return as 2.3%).

Question

What is classification?

Answer

Supervised learning where the output is a discrete category or class (e.g., buy/sell/hold, up/down, high risk/low risk).

Question

Is predicting exact stock price regression or classification?

Answer

Regression - the output is a continuous value (price can be any number).

Question

Is predicting whether a stock will go up or down regression or classification?

Answer

Classification - the output is a discrete category (up or down).

Question

What does HIGH BIAS mean?

Answer

Model is too simple and cannot capture the underlying pattern. Results in underfitting with high training error AND high test error. The model is biased toward overly simple assumptions.

Question

What does HIGH VARIANCE mean?

Answer

Model is too complex and learns noise as if it were signal. Results in overfitting with low training error BUT high test error. The model varies wildly based on training data.

Question

Training error = 25%, Test error = 26%. What's the problem?

Answer

HIGH BIAS (underfitting). Both errors are high and similar. The model is too simple to learn the pattern.

Question

Training error = 2%, Test error = 18%. What's the problem?

Answer

HIGH VARIANCE (overfitting). Large gap between training and test error. The model memorized training data but doesn't generalize.

Question

Training error = 4%, Test error = 5%. What's the status?

Answer

GOOD GENERALIZATION (sweet spot). Both errors are low with small gap. Model has appropriate complexity - low bias and low variance.

Question

What is the bias-variance tradeoff?

Answer

As model complexity increases, bias decreases but variance increases. The goal is to find the balance that minimizes test error - not too simple (high bias) and not too complex (high variance).

Question

How do you reduce HIGH BIAS (underfitting)?

Answer

Add complexity: use more features, increase model complexity (deeper trees, higher polynomial degree), reduce regularization, train longer.

Question

How do you reduce HIGH VARIANCE (overfitting)?

Answer

Reduce complexity: use fewer features, simplify model (prune trees, lower polynomial degree), add regularization, get more training data, use ensembles.

Question

What is training error?

Answer

Error on the data the model learned from. Always decreases (or stays same) as model complexity increases. Can be misleadingly low with overfitting.

Question

What is test error?

Answer

Error on new, unseen data the model hasn't trained on. This is what we actually care about for real-world performance. U-shaped curve as complexity increases.

Question

Why is standard k-fold cross-validation dangerous for time series?

Answer

It causes lookahead bias - you can train on future data to predict the past because folds are randomly split. This violates temporal order and gives unrealistic performance estimates.

Question

What is lookahead bias?

Answer

When a model uses information from the future (that wouldn't be available in real trading) during training. This inflates backtest performance but causes failure in live trading.

Question

What is walk-forward (forward chaining) cross-validation?

Answer

Time series cross-validation where you always train on past data and test on future data. Training set grows or rolls forward, respecting temporal order and avoiding lookahead bias.

Question

What is the main advantage of walk-forward CV over single train/test split?

Answer

Multiple test evaluations across different time periods give more robust performance estimates. Reduces risk of overfitting to one specific test period and shows performance across different market conditions.

Question

Why is a single train/test split risky for time series?

Answer

You get only ONE test score from ONE specific time period. Model might perform well by chance on that period but fail in other market conditions. Risk of overfitting to that particular test period.

Question

What is standardization (z-score normalization)?

Answer

Feature scaling using (X - mean) / std. Results in mean=0, std=1. Preserves distribution shape, handles outliers better. Most common in ML4T.

Question

What is min-max scaling?

Answer

Feature scaling using (X - min) / (max - min). Results in range [0,1]. Sensitive to outliers. Less common in ML4T than standardization.

Question

Why do we scale features?

Answer

To put features on similar scales so no single feature dominates due to its magnitude. Critical for distance-based algorithms (KNN) and gradient descent (neural networks, linear regression).

Question

Which algorithms NEED feature scaling?

Answer

Scale-sensitive algorithms: KNN, linear/logistic regression with gradient descent, neural networks, SVM, PCA. These use distances or gradients affected by feature magnitude.

Question

Which algorithms DON'T need feature scaling?

Answer

Scale-invariant algorithms: Decision trees, Random Forests, tree-based models in general, Naive Bayes. These use thresholds or probabilities, not distances.

Question

When scaling features, should you use statistics from the entire dataset?

Answer

NO! Calculate mean/std using ONLY training data, then apply those same parameters to test data. Using test data statistics causes data leakage.

Question

What is feature selection?

Answer

Choosing which existing features to use. Reduces overfitting, speeds training, improves interpretability. Methods: correlation analysis, feature importance, forward/backward selection, L1 regularization.

Question

What is feature engineering?

Answer

Creating new features from existing ones. In ML4T: technical indicators (momentum, Bollinger Bands, moving averages), derived metrics (volatility, returns), crossovers, etc.

Question

Is PCA supervised or unsupervised learning?

Answer

Unsupervised - it finds patterns in the features themselves without using any labels. Even if you later use PCA components in supervised learning, PCA itself is unsupervised.

Question

Is calculating Sharpe ratio machine learning?

Answer

No - it's just statistical computation using a formula. Not ML because there's no learning from data to make predictions or discover patterns.

Question

Training on 2023 data, testing on 2024 data - does this avoid lookahead bias?

Answer

Yes, it respects temporal order. But it's still risky because you get only ONE test period. Walk-forward CV with multiple test periods is more robust.

Question

Model always predicts 0% return. High or low bias?

Answer

HIGH BIAS - the model is too simple (ignores all features). This leads to underfitting with high training and test errors.

Question

Model with 500 features fits training data perfectly but fails on test data. High or low variance?

Answer

HIGH VARIANCE - the model is too complex and overfits. It memorizes training data (including noise) but doesn't generalize to new data.

Question

Why does KNN need feature scaling but Random Forest doesn't?

Answer

KNN uses distance calculations where large-scale features dominate. Random Forest uses threshold-based splits where only the ordering matters, not the magnitude.

Question

Expanding window vs rolling window in walk-forward CV?

Answer

Expanding: training data keeps growing (uses all past). Rolling: fixed-size training window slides forward (uses recent data only). Choose based on whether more data is better or recent data is more relevant.