AWS Certified Machine Learning – Specialty

AWS Certified Machine Learning – Specialty Intermediate — Quiz 2

AWS Certified Machine Learning – Specialty Intermediate — Quiz 2 — Study Guide

AWS ML Specialty Intermediate: Modeling Techniques, Evaluation & Core Algorithms

Machine learning success depends not just on picking an algorithm, but on understanding *why* you pick it, *how* to tune it, and *what* to measure. This lesson covers the core modeling concepts tested in the AWS ML Specialty exam — from regression and regularization to neural networks and recommendation systems.

Regression vs. Classification

Regression predicts a continuous value (e.g., house prices, stock returns). Classification predicts a discrete label (e.g., spam/not spam).

Task	Output Type	Example Algorithm
Regression	Continuous number	Linear Regression
Classification	Category/class	Logistic Regression, SVM

💡 Exam tip: "Predict house prices" → regression. "Predict if email is spam" → classification.

Overfitting, Underfitting, and Bias

Overfitting: Model learns training data *too well*, including noise. High variance, poor generalization.

Underfitting: Model is too simple to capture patterns. High bias.

Bias refers to systematic errors from wrong assumptions — an underfit model has high bias.

Think of it like studying for an exam: memorizing every practice question (overfitting) vs. barely reading the material (underfitting). You want the sweet spot.

Regularization: Fighting Overfitting

Regularization adds a penalty to the loss function to discourage overly complex models.

L1 (Lasso): Adds the *absolute value* of coefficients. Can zero out features → automatic feature selection.

L2 (Ridge): Adds the *square* of coefficients. Shrinks all features but keeps them → better when all features matter.

from sklearn.linear_model import Lasso, Ridgelasso = Lasso(alpha=0.1)   # L1 — can eliminate features
ridge = Ridge(alpha=1.0)   # L2 — shrinks all coefficients

Handling Imbalanced Datasets

In fraud detection, 99% of transactions may be legitimate. A model that always predicts "not fraud" is 99% accurate but useless.

Techniques:

Oversampling: Duplicate or synthetically generate minority class samples (e.g., SMOTE).

Undersampling: Remove majority class samples.

Class weights: Penalize misclassifying the minority class more heavily.

Metrics for Imbalanced Data (e.g., Fraud Detection)

Metric	When to Use
Recall (Sensitivity)	Critical when missing positives is costly (fraud!)
Precision	When false positives are costly
F1 Score	Balance of precision and recall
AUC-ROC	Overall model discrimination ability

💡 For fraud detection, recall is most important — you'd rather flag a legitimate transaction than miss actual fraud.

Cross-Validation

Cross-validation estimates how well your model generalizes to unseen data. The most common form is k-fold cross-validation: split data into k subsets, train on k-1, test on 1, rotate k times.

from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)  # 5-fold CV
print(scores.mean())

This is more reliable than a single train/test split, especially with small datasets.

Feature Engineering & Preprocessing

Feature Scaling

Algorithms like KNN and SVM are sensitive to feature magnitude. Always scale when using distance-based methods.

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

One-Hot Encoding

Converts categorical variables into binary columns so algorithms can process them.

import pandas as pd
pd.get_dummies(df['color'])  # 'red','blue' → binary columns

Dimensionality Reduction & PCA

PCA (Principal Component Analysis) reduces the number of features by finding directions of maximum variance. Useful when you have many correlated features.

Core Algorithms

Decision Trees

Split data based on feature thresholds. Interpretable but prone to overfitting.

KNN (K-Nearest Neighbors)

Classifies a point based on the majority class of its k nearest neighbors. Simple but slow on large datasets. Requires feature scaling.

SVM & The Kernel Trick

Support Vector Machines find the hyperplane that best separates classes. The kernel trick maps data into higher dimensions to find non-linear boundaries without explicitly computing the transformation.

Ensemble Methods

Combine multiple models to improve performance:

Bagging (e.g., Random Forest): Train models on random subsets, average results → reduces variance.

Gradient Boosting (e.g., XGBoost): Train models sequentially, each correcting the previous → reduces bias.

Learning rate in gradient boosting controls how much each tree corrects the previous error. Lower = more trees needed, but often better generalization.

Neural Networks

Activation Functions

Introduce non-linearity into the network. Common choices:

ReLU: Most popular for hidden layers. Fast, avoids vanishing gradient.

Sigmoid/Tanh: Used in output layers for binary classification.

Vanishing Gradient Problem

In deep networks, gradients shrink as they propagate backward through many layers, making early layers learn very slowly. ReLU and batch normalization help mitigate this.

Generative vs. Discriminative Models

Type	What It Learns	Example
Discriminative	Decision boundary between classes	Logistic Regression, SVM
Generative	Distribution of each class	Naive Bayes, GANs

Recommendation Systems

Collaborative filtering recommends items based on the behavior of *similar users* (user-based) or *similar items* (item-based). It doesn't need item content — just interaction data (ratings, clicks).

Example: "Users who bought X also bought Y" — that's collaborative filtering.

Loss Functions

The loss function measures how wrong your model is. Common examples:

MSE (Mean Squared Error): For regression

Cross-Entropy: For classification

Key Takeaways

L1 regularization performs feature selection by zeroing coefficients; L2 shrinks all coefficients — both fight overfitting.

For imbalanced datasets (like fraud detection), optimize for recall or F1 score, not raw accuracy.

Cross-validation gives a more reliable estimate of model performance than a single train/test split.

Bagging reduces variance (Random Forest); Gradient Boosting reduces bias — choose based on your model's weakness.

Always apply feature scaling before using distance-based algorithms like KNN or SVM, and use one-hot encoding for categorical variables.