ISTEB Foundations in Site Reliability Engineering Intermediate — Quiz 2
ISTEB Foundations in Site Reliability Engineering Intermediate — Quiz 2 — Study Guide
ISTEB Foundations in SRE Intermediate — Quiz 2 Study Guide
Modern software systems are too large and complex to manage by hand. Site Reliability Engineers rely on automation, standardized tooling, and repeatable processes to keep infrastructure consistent, recoverable, and secure. This lesson covers the core concepts you'll need to master for Quiz 2 — from writing infrastructure code to deploying safely across multiple clouds.
Infrastructure as Code (IaC)
Infrastructure as Code means defining your servers, networks, databases, and other resources in text files — just like application code. Instead of clicking through a cloud console, you write a file that describes what you want, and a tool provisions it for you.
Primary Benefits
Declarative vs. Imperative
| Style | What you write | Example tools |
|---|---|---|
| Declarative | *What* the end state should look like | Terraform, Puppet |
| Imperative | *How* to get to the end state, step by step | Ansible (procedural mode), Bash scripts |
# Declarative Terraform example — describe desired state
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
}Popular IaC & Configuration Management Tools
Terraform
Terraform is the most widely used declarative IaC tool. It talks to cloud provider APIs and tracks what it has created in a state file (.tfstate). The state file is the source of truth about real-world resources.Ansible
Ansible is an agentless configuration management tool that uses YAML "playbooks." It connects over SSH and runs tasks in order — making it more imperative in style.# Ansible playbook snippet
name: Install nginx
apt:
name: nginx
state: presentPuppet
Puppet uses a declarative, agent-based model. Agents on each server regularly "pull" the desired configuration from a Puppet server and enforce it — great for large fleets.Idempotence
Idempotence means running the same operation multiple times produces the same result as running it once. This is critical in automation.
Analogy: Pressing the elevator button ten times doesn't make it arrive ten times faster — and it doesn't cause ten elevators to arrive. One press = one result.
If your Ansible playbook says "nginx should be installed," running it 5 times won't install nginx 5 times. It checks, and if nginx is already there, it does nothing. Non-idempotent scripts (like raw apt install) can fail or create duplicates on re-runs.
Version Control and Git
Every IaC file should live in a version control system. Git is the standard.
dev → staging → maingit init
git add main.tf
git commit -m "Add web server resource"
git push origin feature/add-web-serverCI/CD Pipelines and Automation
A CI/CD pipeline automates the journey from code commit to deployed infrastructure. In SRE, pipelines typically:
terraform plan or equivalent dry-runLinting and code quality checks prevent bad configs from ever reaching production. Tools like tflint for Terraform or ansible-lint for Ansible flag problems early.
Security Concepts
Secrets Management
Never store passwords, API keys, or certificates in your Git repository. Use dedicated secrets management tools:Policy as Code & Compliance
Policy as code encodes security and compliance rules as machine-readable files. Tools like Open Policy Agent (OPA) can block a Terraform plan that would open port 22 to the world — automatically, before anything is deployed.Risks of Poor IaC Practices
Immutability and Deployment Strategies
Immutable Infrastructure
Instead of patching a running server, you replace it entirely with a new, pre-baked image. This eliminates "snowflake servers" — unique, hand-configured machines that are impossible to reproduce.Blue/Green Deployments
Run two identical production environments — blue (current) and green (new). Traffic switches to green once it passes tests. If something breaks, flip back to blue instantly.[Users] → [Load Balancer] → [Blue Environment] (active)
→ [Green Environment] (staging new version)This strategy supports incident recovery — you always have a known-good environment to fall back to.