Ml Algorithms
20 essential ML algorithms for trading and data science
What You'll Learn
Master machine learning algorithms with 20 flashcards covering decision trees, leaf_size, bagging, Random Forests, ensemble methods, classification vs regression trees, and overfitting prevention for ML4T applications.
Key Topics
- Decision tree fundamentals and leaf_size parameter
- Classification vs regression trees comparison
- Bagging (bootstrap aggregating) technique
- Random Forests and feature randomization
- Ensemble learning advantages and variance reduction
- Overfitting prevention in tree-based models
Looking for more machine learning resources? Visit the Explore page to browse related decks or use the Create Your Own Deck flow to customize this set.
How to study this deck
Start with a quick skim of the questions, then launch study mode to flip cards until you can answer each prompt without hesitation. Revisit tricky cards using shuffle or reverse order, and schedule a follow-up review within 48 hours to reinforce retention.
Preview: Ml Algorithms
Question
What is the primary purpose of leaf_size in decision trees?
Answer
Leaf_size controls overfitting. Smaller leaf_size leads to more complex trees that can overfit training data. Larger leaf_size creates simpler trees that generalize better but may underfit.
Question
What is the key difference between decision trees for classification vs regression?
Answer
Classification trees predict discrete classes (categories), while regression trees predict continuous values. The leaf node contains the most common class (mode) for classification or the average value (mean) for regression.
Question
What is bagging (bootstrap aggregating)?
Answer
Bagging creates multiple bootstrap samples (random sampling with replacement) from training data, trains a separate learner on each sample, then aggregates predictions (averaging for regression, voting for classification) to reduce variance.
Question
How do Random Forests improve upon simple bagging?
Answer
Random Forests add feature randomization to bagging. At each split, only a random subset of features is considered. This decorrelates the trees, reducing variance further and improving ensemble performance.
Question
What is the main advantage of ensemble learning methods?
Answer
Ensemble methods reduce variance by combining predictions from multiple learners. They are less likely to overfit than single models and generally provide more stable and accurate predictions.
Question
In KNN, what happens to bias and variance as k increases?
Answer
As k increases: bias increases (model becomes simpler, may underfit) and variance decreases (predictions more stable, less sensitive to individual points). Small k: low bias, high variance. Large k: high bias, low variance.
Question
What is the curse of dimensionality in KNN?
Answer
In high-dimensional spaces, all points become approximately equidistant from each other. This makes distance metrics less meaningful and requires exponentially more data to maintain the same density, degrading KNN performance.
Question
What distance metric is commonly used in KNN for trading features?
Answer
Euclidean distance is most common: sqrt(sum of squared differences). However, features should be normalized/standardized first since trading indicators have different scales (e.g., RSI 0-100 vs Momentum -1 to +1).
Question
What is the computational complexity of KNN prediction?
Answer
O(n × d) where n is number of training samples and d is dimensionality. KNN is computationally expensive at prediction time because it must compute distance to all training points. No training phase, all work at query time.
Question
What does linear regression minimize?
Answer
Linear regression minimizes the sum of squared errors (SSE) between predicted values and actual values using least squares estimation. It finds the line (or hyperplane) that best fits the training data.
Question
What is overfitting in linear regression and how is it addressed?
Answer
Overfitting occurs when the model fits training noise instead of true patterns, performing poorly on new data. It's addressed through regularization (Ridge/Lasso), cross-validation, and limiting model complexity.
Question
What is Ridge (L2) regularization?
Answer
Ridge adds a penalty term to the loss function proportional to the square of coefficient magnitudes: Loss = SSE + λ × sum(weights²). This shrinks coefficients toward zero, preventing overfitting. Larger λ means stronger regularization.
Question
What is Lasso (L1) regularization?
Answer
Lasso adds a penalty proportional to the absolute value of coefficients: Loss = SSE + λ × sum(|weights|). Unlike Ridge, Lasso can shrink some coefficients exactly to zero, performing feature selection.
Question
Ridge vs Lasso: When to use which?
Answer
Use Ridge when you believe most features are relevant (shrinks all coefficients). Use Lasso when you suspect many features are irrelevant (performs feature selection by zeroing coefficients). Lasso creates sparse models.
Question
How does bagging reduce variance mathematically?
Answer
If predictors are independent with variance σ², averaging n predictors gives variance σ²/n. Bagging approximates this by creating diverse predictors through bootstrap sampling, though they're not fully independent.
Question
What is a Random Tree learner in the context of Random Forests?
Answer
A Random Tree is a decision tree that uses random feature selection at each split. In ensemble, multiple Random Trees form a Random Forest. For trading: converts to classification using mode (most common prediction) instead of mean.
Question
Why use minimum leaf_size of 5 for Random Forest in trading strategies?
Answer
A minimum leaf_size of 5 prevents overfitting by ensuring each leaf has at least 5 samples. This creates more generalizable trees that perform better on out-of-sample trading data, avoiding fitting to market noise.
Question
How is a regression learner converted to classification?
Answer
Instead of returning the mean of values in the leaf (regression), return the mode (most common value) in the leaf (classification). For trading: classify into discrete actions like Buy/Sell/Hold rather than continuous position sizes.
Question
What is boosting and how does it differ from bagging?
Answer
Boosting is sequential: each new learner focuses on examples the previous learners got wrong by adaptively reweighting training data. Bagging trains learners independently in parallel. Boosting reduces both bias and variance but can overfit.
Question
In trading context, why normalize/standardize indicator values?
Answer
Different indicators have different scales (Bollinger %B: 0-1, RSI: 0-100, Momentum: -inf to +inf). Standardization (Standard Score: (x - mean)/std) makes them comparable and prevents indicators with larger ranges from dominating distance calculations in KNN or having outsized influence in regression.
Question
What is the bias-variance tradeoff?
Answer
Total Error = Bias² + Variance + Irreducible Error. Bias: error from wrong assumptions (underfitting). Variance: error from sensitivity to training data (overfitting). Goal: minimize total error by balancing bias and variance.
Question
Why might KNN perform poorly in high-frequency trading scenarios?
Answer
KNN has high computational cost at prediction time (must check all training points) and suffers in high dimensions. HFT requires extremely fast decisions with many features, making KNN's O(n×d) complexity and curse of dimensionality problematic.
Question
What is bootstrap sampling?
Answer
Random sampling WITH replacement from the original dataset to create a new dataset of the same size. About 63% of original samples appear (some multiple times), 37% don't appear. Used in bagging to create diverse training sets.
Question
How does feature randomization in Random Forests decorrelate trees?
Answer
Without feature randomization, all trees would likely split on the same strong features, making them highly correlated. Random feature selection forces trees to use different features, creating diverse predictors whose errors don't correlate, improving ensemble performance.
Question
For a trading strategy, should you use high k or low k in KNN?
Answer
It depends on market regime. Low k captures local patterns (good for changing markets) but may overfit to noise. High k smooths predictions (good for stable trends) but may miss regime changes. Use cross-validation to find optimal k for your specific market and timeframe.