Kubernetes Observability: Best Practices for Monitoring and Troubleshooting
Running Kubernetes without proper observability is like flying blind. When something breaks at 2 AM — and it will — you need to know *what* broke, *why* it broke, and *where* in your cluster it…
Kubernetes Observability: Best Practices for Monitoring and Troubleshooting
Running Kubernetes without proper observability is like flying blind. When something breaks at 2 AM — and it will — you need to know *what* broke, *why* it broke, and *where* in your cluster it happened. That's exactly what observability gives you: the ability to understand your system's internal state from the outside.
Observability in Kubernetes isn't just "add some metrics and call it a day." It's three interconnected pillars: metrics, logs, and traces. Skip any one of them and you'll spend twice as long debugging the next incident.
Let's build this out properly.
The Three Pillars (and Why You Need All of Them)
Metrics tell you *that* something is wrong — CPU spiking, pod restarts climbing, request latency increasing.
Logs tell you *what* happened — the error message, the stack trace, the context around the failure.
Traces tell you *where* it went wrong across services — which microservice in a chain of ten is actually responsible for that 3-second response time.
You can debug with just metrics and logs for a while, but once you have more than five or six services talking to each other, distributed tracing becomes non-negotiable.
Setting Up Metrics with Prometheus
Prometheus is the de facto standard for Kubernetes metrics. It scrapes metrics endpoints, stores time-series data, and integrates with Kubernetes service discovery out of the box.
The easiest way to get started is with the kube-prometheus-stack Helm chart, which bundles Prometheus, Alertmanager, and Grafana together:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo updatehelm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace
For your own applications, expose a /metrics endpoint and annotate your pods so Prometheus picks them up:
apiVersion: v1
kind: Pod
metadata:
name: my-app
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
containers:
- name: my-app
image: my-app:latest
ports:
- containerPort: 8080A few metrics you should be tracking from day one:
kube_pod_container_status_restarts_total — pod restart counts (a climbing number here is always a red flag)container_cpu_usage_seconds_total — CPU usage per containercontainer_memory_working_set_bytes — actual memory in usehttp_request_duration_seconds — your application's request latency (instrument this yourself with a client library)Alerting That Actually Works
Prometheus alerts without Alertmanager routing are useless noise. Here's a practical alert rule that fires when a pod has restarted more than 5 times in the last hour:
groups:
- name: pod-health
rules:
- alert: PodCrashLooping
expr: |
increase(kube_pod_container_status_restarts_total[1h]) > 5
for: 5m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} is crash looping"
description: "Container {{ $labels.container }} in pod {{ $labels.pod }} has restarted {{ $value }} times in the last hour."Keep your alerts actionable. Every alert should have a clear owner and a runbook link. If your on-call engineer can't do anything about an alert at 3 AM, it shouldn't page them.
Centralized Logging with the EFK Stack
Kubernetes logs are ephemeral — when a pod dies, its logs go with it unless you ship them somewhere first. The EFK stack (Elasticsearch, Fluentd, Kibana) is a solid choice for centralized logging.
Deploy Fluentd as a DaemonSet so it runs on every node and captures all container logs:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
namespace: logging
spec:
selector:
matchLabels:
name: fluentd
template:
metadata:
labels:
name: fluentd
spec:
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
containers:
- name: fluentd
image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch.logging.svc.cluster.local"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containersStructured logging matters here. If your application logs plain text, you're making your own life harder. Log JSON instead:
{
"timestamp": "2024-01-15T10:30:00Z",
"level": "error",
"service": "payment-service",
"trace_id": "abc123",
"message": "Payment processing failed",
"error": "connection timeout",
"user_id": "u-789"
}Structured logs are searchable, filterable, and can be correlated with traces using that trace_id field.
Distributed Tracing with Jaeger
Once you have more than a handful of services, you need traces. Jaeger integrates well with Kubernetes and supports OpenTelemetry, which is the instrumentation standard you should be using.
Deploy Jaeger with the all-in-one image for development, or use the Jaeger Operator for production:
kubectl create namespace observability
kubectl apply -f https://github.com/jaegertracing/jaeger-operator/releases/latest/download/jaeger-operator.yaml \
-n observabilityThen create a Jaeger instance:
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: jaeger
namespace: observability
spec:
strategy: production
storage:
type: elasticsearch
options:
es:
server-urls: http://elasticsearch.logging.svc.cluster.local:9200Instrument your application with OpenTelemetry (here's a Node.js example):
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({
url: 'http://jaeger-collector.observability.svc.cluster.local:4318/v1/traces',
}),
serviceName: 'payment-service',
});
sdk.start();
Grafana: Tying It All Together
Grafana is your single pane of glass. Connect it to Prometheus for metrics, Elasticsearch for logs, and Jaeger for traces. The kube-prometheus-stack chart already includes Grafana with pre-built dashboards for cluster health.
A few dashboards worth importing immediately:
Practical Tips That Save You Time
Set resource requests and limits on everything. Without them, you can't trust your resource metrics. A pod without limits can starve its neighbors and your dashboards won't tell you why.
Use namespace-level separation for your observability stack. Keep monitoring, logging, and observability namespaces separate from your application namespaces. It makes RBAC and resource management much cleaner.
Correlate across pillars. The real power comes when you can jump from a Grafana alert → filter logs by time range → find the trace ID → follow the trace through your services. Build this workflow before you need it in an incident.
Don't skip liveness and readiness probes. They're not just for Kubernetes health checks — they're also signals that feed into your metrics and alert on pod restarts.
Your Next Steps
kube-prometheus-stack if you haven't already — it gets you metrics and dashboards in under 10 minutesObservability isn't a one-time setup. It's a practice. The teams that do it well are the ones that treat it as a first-class engineering concern, not an afterthought bolted on after something breaks.