AWS Certified Machine Learning – Specialty Intermediate — Quiz 2
AWS Certified Machine Learning – Specialty Intermediate — Quiz 2 — Study Guide
AWS ML Specialty Intermediate: Modeling Techniques, Evaluation & Core Algorithms
Machine learning success depends not just on picking an algorithm, but on understanding *why* you pick it, *how* to tune it, and *what* to measure. This lesson covers the core modeling concepts tested in the AWS ML Specialty exam — from regression and regularization to neural networks and recommendation systems.
Regression vs. Classification
Regression predicts a continuous value (e.g., house prices, stock returns). Classification predicts a discrete label (e.g., spam/not spam).
| Task | Output Type | Example Algorithm |
|---|---|---|
| Regression | Continuous number | Linear Regression |
| Classification | Category/class | Logistic Regression, SVM |
💡 Exam tip: "Predict house prices" → regression. "Predict if email is spam" → classification.
Overfitting, Underfitting, and Bias
Think of it like studying for an exam: memorizing every practice question (overfitting) vs. barely reading the material (underfitting). You want the sweet spot.
Regularization: Fighting Overfitting
Regularization adds a penalty to the loss function to discourage overly complex models.
from sklearn.linear_model import Lasso, Ridgelasso = Lasso(alpha=0.1) # L1 — can eliminate features
ridge = Ridge(alpha=1.0) # L2 — shrinks all coefficients
Handling Imbalanced Datasets
In fraud detection, 99% of transactions may be legitimate. A model that always predicts "not fraud" is 99% accurate but useless.
Techniques:
Metrics for Imbalanced Data (e.g., Fraud Detection)
| Metric | When to Use |
|---|---|
| Recall (Sensitivity) | Critical when missing positives is costly (fraud!) |
| Precision | When false positives are costly |
| F1 Score | Balance of precision and recall |
| AUC-ROC | Overall model discrimination ability |
💡 For fraud detection, recall is most important — you'd rather flag a legitimate transaction than miss actual fraud.
Cross-Validation
Cross-validation estimates how well your model generalizes to unseen data. The most common form is k-fold cross-validation: split data into k subsets, train on k-1, test on 1, rotate k times.
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5) # 5-fold CV
print(scores.mean())This is more reliable than a single train/test split, especially with small datasets.
Feature Engineering & Preprocessing
Feature Scaling
Algorithms like KNN and SVM are sensitive to feature magnitude. Always scale when using distance-based methods.from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)One-Hot Encoding
Converts categorical variables into binary columns so algorithms can process them.import pandas as pd
pd.get_dummies(df['color']) # 'red','blue' → binary columnsDimensionality Reduction & PCA
PCA (Principal Component Analysis) reduces the number of features by finding directions of maximum variance. Useful when you have many correlated features.Core Algorithms
Decision Trees
Split data based on feature thresholds. Interpretable but prone to overfitting.KNN (K-Nearest Neighbors)
Classifies a point based on the majority class of its k nearest neighbors. Simple but slow on large datasets. Requires feature scaling.SVM & The Kernel Trick
Support Vector Machines find the hyperplane that best separates classes. The kernel trick maps data into higher dimensions to find non-linear boundaries without explicitly computing the transformation.Ensemble Methods
Combine multiple models to improve performance:Learning rate in gradient boosting controls how much each tree corrects the previous error. Lower = more trees needed, but often better generalization.
Neural Networks
Activation Functions
Introduce non-linearity into the network. Common choices:Vanishing Gradient Problem
In deep networks, gradients shrink as they propagate backward through many layers, making early layers learn very slowly. ReLU and batch normalization help mitigate this.Generative vs. Discriminative Models
| Type | What It Learns | Example |
|---|---|---|
| Discriminative | Decision boundary between classes | Logistic Regression, SVM |
| Generative | Distribution of each class | Naive Bayes, GANs |
Recommendation Systems
Collaborative filtering recommends items based on the behavior of *similar users* (user-based) or *similar items* (item-based). It doesn't need item content — just interaction data (ratings, clicks).
Example: "Users who bought X also bought Y" — that's collaborative filtering.