Ml Algorithms

20 essential ML algorithms for trading and data science

What You'll Learn

Master machine learning algorithms with 20 flashcards covering decision trees, leaf_size, bagging, Random Forests, ensemble methods, classification vs regression trees, and overfitting prevention for ML4T applications.

Key Topics

Decision tree fundamentals and leaf_size parameter
Classification vs regression trees comparison
Bagging (bootstrap aggregating) technique
Random Forests and feature randomization
Ensemble learning advantages and variance reduction
Overfitting prevention in tree-based models

Looking for more machine learning resources? Visit the Explore page to browse related decks or use the Create Your Own Deck flow to customize this set.

How to study this deck

Start with a quick skim of the questions, then launch study mode to flip cards until you can answer each prompt without hesitation. Revisit tricky cards using shuffle or reverse order, and schedule a follow-up review within 48 hours to reinforce retention.

Preview: Ml Algorithms

Question

What is the primary purpose of leaf_size in decision trees?

Answer

Leaf_size controls overfitting. Smaller leaf_size leads to more complex trees that can overfit training data. Larger leaf_size creates simpler trees that generalize better but may underfit.

Question

What is the key difference between decision trees for classification vs regression?

Answer

Classification trees predict discrete classes (categories), while regression trees predict continuous values. The leaf node contains the most common class (mode) for classification or the average value (mean) for regression.

Question

What is bagging (bootstrap aggregating)?

Answer

Bagging creates multiple bootstrap samples (random sampling with replacement) from training data, trains a separate learner on each sample, then aggregates predictions (averaging for regression, voting for classification) to reduce variance.

Question

How do Random Forests improve upon simple bagging?

Answer

Random Forests add feature randomization to bagging. At each split, only a random subset of features is considered. This decorrelates the trees, reducing variance further and improving ensemble performance.

Question

What is the main advantage of ensemble learning methods?

Answer

Ensemble methods reduce variance by combining predictions from multiple learners. They are less likely to overfit than single models and generally provide more stable and accurate predictions.

Question

In KNN, what happens to bias and variance as k increases?

Answer

As k increases: bias increases (model becomes simpler, may underfit) and variance decreases (predictions more stable, less sensitive to individual points). Small k: low bias, high variance. Large k: high bias, low variance.

Question

What is the curse of dimensionality in KNN?

Answer

In high-dimensional spaces, all points become approximately equidistant from each other. This makes distance metrics less meaningful and requires exponentially more data to maintain the same density, degrading KNN performance.

Question

What distance metric is commonly used in KNN for trading features?

Answer

Euclidean distance is most common: sqrt(sum of squared differences). However, features should be normalized/standardized first since trading indicators have different scales (e.g., RSI 0-100 vs Momentum -1 to +1).

Question

What is the computational complexity of KNN prediction?

Answer

O(n × d) where n is number of training samples and d is dimensionality. KNN is computationally expensive at prediction time because it must compute distance to all training points. No training phase, all work at query time.

Question

What does linear regression minimize?

Answer

Linear regression minimizes the sum of squared errors (SSE) between predicted values and actual values using least squares estimation. It finds the line (or hyperplane) that best fits the training data.

Question

What is overfitting in linear regression and how is it addressed?

Answer

Overfitting occurs when the model fits training noise instead of true patterns, performing poorly on new data. It's addressed through regularization (Ridge/Lasso), cross-validation, and limiting model complexity.

Question

What is Ridge (L2) regularization?

Answer

Ridge adds a penalty term to the loss function proportional to the square of coefficient magnitudes: Loss = SSE + λ × sum(weights²). This shrinks coefficients toward zero, preventing overfitting. Larger λ means stronger regularization.

Question

What is Lasso (L1) regularization?

Answer

Lasso adds a penalty proportional to the absolute value of coefficients: Loss = SSE + λ × sum(|weights|). Unlike Ridge, Lasso can shrink some coefficients exactly to zero, performing feature selection.

Question

Ridge vs Lasso: When to use which?

Answer

Use Ridge when you believe most features are relevant (shrinks all coefficients). Use Lasso when you suspect many features are irrelevant (performs feature selection by zeroing coefficients). Lasso creates sparse models.

Question

How does bagging reduce variance mathematically?

Answer

If predictors are independent with variance σ², averaging n predictors gives variance σ²/n. Bagging approximates this by creating diverse predictors through bootstrap sampling, though they're not fully independent.

Question

What is a Random Tree learner in the context of Random Forests?

Answer

A Random Tree is a decision tree that uses random feature selection at each split. In ensemble, multiple Random Trees form a Random Forest. For trading: converts to classification using mode (most common prediction) instead of mean.

Question

Why use minimum leaf_size of 5 for Random Forest in trading strategies?

Answer

A minimum leaf_size of 5 prevents overfitting by ensuring each leaf has at least 5 samples. This creates more generalizable trees that perform better on out-of-sample trading data, avoiding fitting to market noise.

Question

How is a regression learner converted to classification?

Answer

Instead of returning the mean of values in the leaf (regression), return the mode (most common value) in the leaf (classification). For trading: classify into discrete actions like Buy/Sell/Hold rather than continuous position sizes.

Question

What is boosting and how does it differ from bagging?

Answer

Boosting is sequential: each new learner focuses on examples the previous learners got wrong by adaptively reweighting training data. Bagging trains learners independently in parallel. Boosting reduces both bias and variance but can overfit.

Question

In trading context, why normalize/standardize indicator values?

Answer

Different indicators have different scales (Bollinger %B: 0-1, RSI: 0-100, Momentum: -inf to +inf). Standardization (Standard Score: (x - mean)/std) makes them comparable and prevents indicators with larger ranges from dominating distance calculations in KNN or having outsized influence in regression.

Question

What is the bias-variance tradeoff?

Answer

Total Error = Bias² + Variance + Irreducible Error. Bias: error from wrong assumptions (underfitting). Variance: error from sensitivity to training data (overfitting). Goal: minimize total error by balancing bias and variance.

Question

Why might KNN perform poorly in high-frequency trading scenarios?

Answer

KNN has high computational cost at prediction time (must check all training points) and suffers in high dimensions. HFT requires extremely fast decisions with many features, making KNN's O(n×d) complexity and curse of dimensionality problematic.

Question

What is bootstrap sampling?

Answer

Random sampling WITH replacement from the original dataset to create a new dataset of the same size. About 63% of original samples appear (some multiple times), 37% don't appear. Used in bagging to create diverse training sets.

Question

How does feature randomization in Random Forests decorrelate trees?

Answer

Without feature randomization, all trees would likely split on the same strong features, making them highly correlated. Random feature selection forces trees to use different features, creating diverse predictors whose errors don't correlate, improving ensemble performance.

Question

For a trading strategy, should you use high k or low k in KNN?

Answer

It depends on market regime. Low k captures local patterns (good for changing markets) but may overfit to noise. High k smooths predictions (good for stable trends) but may miss regime changes. Use cross-validation to find optimal k for your specific market and timeframe.