AWS Certified Machine Learning – Specialty

AWS Certified Machine Learning – Specialty Intermediate — Quiz 2

AWS Certified Machine Learning – Specialty Intermediate — Quiz 2 — Study Guide

AWS ML Specialty Intermediate: Modeling Techniques, Evaluation & Core Algorithms

Machine learning success depends not just on picking an algorithm, but on understanding *why* you pick it, *how* to tune it, and *what* to measure. This lesson covers the core modeling concepts tested in the AWS ML Specialty exam — from regression and regularization to neural networks and recommendation systems.


Regression vs. Classification

Regression predicts a continuous value (e.g., house prices, stock returns). Classification predicts a discrete label (e.g., spam/not spam).

TaskOutput TypeExample Algorithm
RegressionContinuous numberLinear Regression
ClassificationCategory/classLogistic Regression, SVM
💡 Exam tip: "Predict house prices" → regression. "Predict if email is spam" → classification.


Overfitting, Underfitting, and Bias

  • Overfitting: Model learns training data *too well*, including noise. High variance, poor generalization.
  • Underfitting: Model is too simple to capture patterns. High bias.
  • Bias refers to systematic errors from wrong assumptions — an underfit model has high bias.
  • Think of it like studying for an exam: memorizing every practice question (overfitting) vs. barely reading the material (underfitting). You want the sweet spot.

    Regularization: Fighting Overfitting

    Regularization adds a penalty to the loss function to discourage overly complex models.

  • L1 (Lasso): Adds the *absolute value* of coefficients. Can zero out features → automatic feature selection.
  • L2 (Ridge): Adds the *square* of coefficients. Shrinks all features but keeps them → better when all features matter.
  • from sklearn.linear_model import Lasso, Ridge

    lasso = Lasso(alpha=0.1) # L1 — can eliminate features ridge = Ridge(alpha=1.0) # L2 — shrinks all coefficients


    Handling Imbalanced Datasets

    In fraud detection, 99% of transactions may be legitimate. A model that always predicts "not fraud" is 99% accurate but useless.

    Techniques:

  • Oversampling: Duplicate or synthetically generate minority class samples (e.g., SMOTE).
  • Undersampling: Remove majority class samples.
  • Class weights: Penalize misclassifying the minority class more heavily.
  • Metrics for Imbalanced Data (e.g., Fraud Detection)

    MetricWhen to Use
    Recall (Sensitivity)Critical when missing positives is costly (fraud!)
    PrecisionWhen false positives are costly
    F1 ScoreBalance of precision and recall
    AUC-ROCOverall model discrimination ability
    💡 For fraud detection, recall is most important — you'd rather flag a legitimate transaction than miss actual fraud.


    Cross-Validation

    Cross-validation estimates how well your model generalizes to unseen data. The most common form is k-fold cross-validation: split data into k subsets, train on k-1, test on 1, rotate k times.

    from sklearn.model_selection import cross_val_score
    scores = cross_val_score(model, X, y, cv=5)  # 5-fold CV
    print(scores.mean())

    This is more reliable than a single train/test split, especially with small datasets.


    Feature Engineering & Preprocessing

    Feature Scaling

    Algorithms like KNN and SVM are sensitive to feature magnitude. Always scale when using distance-based methods.

    from sklearn.preprocessing import StandardScaler
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)

    One-Hot Encoding

    Converts categorical variables into binary columns so algorithms can process them.

    import pandas as pd
    pd.get_dummies(df['color'])  # 'red','blue' → binary columns

    Dimensionality Reduction & PCA

    PCA (Principal Component Analysis) reduces the number of features by finding directions of maximum variance. Useful when you have many correlated features.


    Core Algorithms

    Decision Trees

    Split data based on feature thresholds. Interpretable but prone to overfitting.

    KNN (K-Nearest Neighbors)

    Classifies a point based on the majority class of its k nearest neighbors. Simple but slow on large datasets. Requires feature scaling.

    SVM & The Kernel Trick

    Support Vector Machines find the hyperplane that best separates classes. The kernel trick maps data into higher dimensions to find non-linear boundaries without explicitly computing the transformation.

    Ensemble Methods

    Combine multiple models to improve performance:
  • Bagging (e.g., Random Forest): Train models on random subsets, average results → reduces variance.
  • Gradient Boosting (e.g., XGBoost): Train models sequentially, each correcting the previous → reduces bias.
  • Learning rate in gradient boosting controls how much each tree corrects the previous error. Lower = more trees needed, but often better generalization.


    Neural Networks

    Activation Functions

    Introduce non-linearity into the network. Common choices:
  • ReLU: Most popular for hidden layers. Fast, avoids vanishing gradient.
  • Sigmoid/Tanh: Used in output layers for binary classification.
  • Vanishing Gradient Problem

    In deep networks, gradients shrink as they propagate backward through many layers, making early layers learn very slowly. ReLU and batch normalization help mitigate this.


    Generative vs. Discriminative Models

    TypeWhat It LearnsExample
    DiscriminativeDecision boundary between classesLogistic Regression, SVM
    GenerativeDistribution of each classNaive Bayes, GANs

    Recommendation Systems

    Collaborative filtering recommends items based on the behavior of *similar users* (user-based) or *similar items* (item-based). It doesn't need item content — just interaction data (ratings, clicks).

    Example: "Users who bought X also bought Y" — that's collaborative filtering.

    Loss Functions

    The loss function measures how wrong your model is. Common examples:
  • MSE (Mean Squared Error): For regression
  • Cross-Entropy: For classification

  • Key Takeaways

  • L1 regularization performs feature selection by zeroing coefficients; L2 shrinks all coefficients — both fight overfitting.
  • For imbalanced datasets (like fraud detection), optimize for recall or F1 score, not raw accuracy.
  • Cross-validation gives a more reliable estimate of model performance than a single train/test split.
  • Bagging reduces variance (Random Forest); Gradient Boosting reduces bias — choose based on your model's weakness.
  • Always apply feature scaling before using distance-based algorithms like KNN or SVM, and use one-hot encoding for categorical variables.