Observability

Observability refers to gathering as much information as possible to enable system operators, DevOps practitioners, and Site Reliability Engineers to ask questions about that information

What is observability - Grafana

The USE Method

The USE method applies to hardware

Utilization - Percentage of time the resource is busy (such as CPU usage of a node)
Saturation - Amount of work a resource has to do, often queue length of node load
Errors - Count of error events
The USE Method
USE Method - Linux Performance Checklist
Grafana Dashboard - Node Exporter / USE Method

The RED Method

The RED method applies to services, it can be represented nicely using a Prometheus histogram

Rate - Requests per second

sum(rate(request_duration_seconds_count{job="..."}[1m]))

Errors - Number of requests that are failing

sum(rate(request_duration_seconds_count{job="...", status_code!~"2.."}[1m]))

Duration - Amount of time these requests take, distribution of latency measurements

histogram_quantile(0.99, sum(rate(request_duration_seconds_bucket{job="..."}[1m])) by (le))

Modelling this for every service will give a consistent overview of how the system is behaving.

RED is a good proxy for user happiness.

The RED Method: How to Instrument Your Services

The Four Golden Signals

Similar to RED, but includes saturation

If you can only measure four metrics of your user-facing system, focus on:

Latency - the time it takes to service a request
- Distinguish between the latency of successful requests vs failed requests
Traffic - a measure of how much demand is being placed on the system
- For web services, this is usually requests per second
  - Could also split by the nature of the request, like lists vs gets
- For storage systems, this might be read and writes per second
Errors - the rate of requests that fail
Saturation - how “full” the service is
Google SRE Book - The Four Golden Signals

Resources

Grafana
- What is observability?
- Common observability strategies
  - USE method
  - RED method
  - Four Golden Signals
The Three Pillars of Observability
The RED Method - Patterns for instrumentation and monitoring slides
- Has Prometheus sample queries for USE and RED

Last update: August 11, 2023
Created: June 3, 2023