ISTEB Foundations in Site Reliability Engineering

ISTEB Foundations in Site Reliability Engineering Intermediate — Quiz 2

ISTEB Foundations in Site Reliability Engineering Intermediate — Quiz 2 — Study Guide

ISTEB Foundations in SRE Intermediate — Quiz 2 Study Guide

Modern software systems are too large and complex to manage by hand. Site Reliability Engineers rely on automation, standardized tooling, and repeatable processes to keep infrastructure consistent, recoverable, and secure. This lesson covers the core concepts you'll need to master for Quiz 2 — from writing infrastructure code to deploying safely across multiple clouds.

Infrastructure as Code (IaC)

Infrastructure as Code means defining your servers, networks, databases, and other resources in text files — just like application code. Instead of clicking through a cloud console, you write a file that describes what you want, and a tool provisions it for you.

Primary Benefits

Repeatability — spin up identical environments every time

Auditability — track every change in version control

Speed — provision hundreds of resources in minutes

Reduced human error — no more missed checkbox in a UI

Declarative vs. Imperative

Style	What you write	Example tools
Declarative	What the end state should look like	Terraform, Puppet
Imperative	How to get to the end state, step by step	Ansible (procedural mode), Bash scripts

# Declarative Terraform example — describe desired state
resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t2.micro"
}

Popular IaC & Configuration Management Tools

Terraform

Terraform is the most widely used declarative IaC tool. It talks to cloud provider APIs and tracks what it has created in a state file (.tfstate). The state file is the source of truth about real-world resources.

Modules — reusable, shareable blocks of Terraform configuration (think functions for infrastructure)

Drift detection — Terraform can compare the state file against actual infrastructure to find unauthorized changes

Ansible

Ansible is an agentless configuration management tool that uses YAML "playbooks." It connects over SSH and runs tasks in order — making it more imperative in style.

# Ansible playbook snippet
name: Install nginx
  apt:
    name: nginx
    state: present

Puppet

Puppet uses a declarative, agent-based model. Agents on each server regularly "pull" the desired configuration from a Puppet server and enforce it — great for large fleets.

Idempotence

Idempotence means running the same operation multiple times produces the same result as running it once. This is critical in automation.

Analogy: Pressing the elevator button ten times doesn't make it arrive ten times faster — and it doesn't cause ten elevators to arrive. One press = one result.

If your Ansible playbook says "nginx should be installed," running it 5 times won't install nginx 5 times. It checks, and if nginx is already there, it does nothing. Non-idempotent scripts (like raw apt install) can fail or create duplicates on re-runs.

Version Control and Git

Every IaC file should live in a version control system. Git is the standard.

Why it matters for IaC: you get a full history of infrastructure changes, can roll back bad configs, and enforce peer review via pull requests

Branching strategies protect production — changes go through dev → staging → main

Git enables CI/CD pipelines to trigger automatically when code is merged

git init
git add main.tf
git commit -m "Add web server resource"
git push origin feature/add-web-server

CI/CD Pipelines and Automation

A CI/CD pipeline automates the journey from code commit to deployed infrastructure. In SRE, pipelines typically:

Lint the code (catch syntax errors and style issues)

Run policy as code checks (e.g., OPA, Sentinel) to enforce compliance rules

Run terraform plan or equivalent dry-run

Apply changes to staging, then production

Linting and code quality checks prevent bad configs from ever reaching production. Tools like tflint for Terraform or ansible-lint for Ansible flag problems early.

Security Concepts

Secrets Management

Never store passwords, API keys, or certificates in your Git repository. Use dedicated secrets management tools:

HashiCorp Vault, AWS Secrets Manager, Azure Key Vault

Inject secrets at runtime, not at commit time

Policy as Code & Compliance

Policy as code encodes security and compliance rules as machine-readable files. Tools like Open Policy Agent (OPA) can block a Terraform plan that would open port 22 to the world — automatically, before anything is deployed.

Risks of Poor IaC Practices

Exposed credentials in code

Unreviewed infrastructure changes

Configuration drift leading to security gaps

Immutability and Deployment Strategies

Immutable Infrastructure

Instead of patching a running server, you replace it entirely with a new, pre-baked image. This eliminates "snowflake servers" — unique, hand-configured machines that are impossible to reproduce.

Blue/Green Deployments

Run two identical production environments — blue (current) and green (new). Traffic switches to green once it passes tests. If something breaks, flip back to blue instantly.

[Users] → [Load Balancer] → [Blue Environment] (active)
                          → [Green Environment] (staging new version)

This strategy supports incident recovery — you always have a known-good environment to fall back to.

Multi-Cloud and Load Testing

Multi-Cloud

Running workloads across AWS, GCP, and Azure simultaneously. IaC tools like Terraform shine here because the same workflow applies regardless of provider. Key concern: avoid vendor lock-in in your modules.

Load Testing and SLOs

A Service Level Objective (SLO) defines a target for reliability (e.g., "99.9% of requests complete in under 200ms"). Load testing validates that your infrastructure can actually meet those SLOs under realistic traffic before you go live. Tools like k6, Locust, or JMeter simulate thousands of users.

Key Takeaways

IaC treats infrastructure like software — it should be version-controlled, reviewed, tested, and deployed through automated pipelines, never manually.

Idempotence is non-negotiable in automation: your scripts and playbooks must be safe to run repeatedly without side effects.

Terraform manages state, detects drift, and uses modules for reuse; Ansible and Puppet handle configuration management with different agent and style trade-offs.

Security must be built into the pipeline — use secrets management tools, policy as code, and linting to catch problems before they reach production.

Immutability and blue/green deployments reduce risk and speed up incident recovery by ensuring you always have a clean, reproducible environment to fall back on.