Observability ¶

Once your application is deployed, you need to know what it's doing. Observability gives you that visibility.

What is observability? ¶

Observability is the ability to understand the state of a system by looking at the logs, metrics, and traces it produces — rather than stepping through the code.

The three pillars of observability are:

Logs - Logs are a record of what has happened in your application. They are useful for debugging, but due to their unstructured format they generally do not scale very well.
Metrics - Metrics are a numerical measurement of something in your application. They are useful for understanding the performance of your application and is generally more scalable than logs both in terms of storage and querying since they are structured data.
Traces - Traces are a record of the path a request takes through your application. They are useful for understanding how a request is processed in your application.

mermaid

graph
  A[Application] --> B(Logs)
  A --> C(Metrics)
  A --> D(Traces)

  click B "#logs"
  click C "#metrics"
  click D "#traces"

Automatic observability ¶

Nais can inject OpenTelemetry agents into your application at startup. With a few lines of YAML configuration, you get traces, metrics, and runtime data flowing to the Nais APM dashboards — no code changes required.

🎯 Get started with auto-instrumentation

Metrics ¶

Metrics are a way to measure the state of your application. Metrics are usually numerical values that can be aggregated and visualized. Metrics are often used to create alerts and dashboards.

We use the OpenMetrics format for metrics. This is a text-based format that is easy to parse and understand. It is also the format used by Prometheus, which is the most popular metrics system.

💡 Learn more about metrics

Mimir ¶

Mimir is a time-series database that stores metrics. It is compatible with Prometheus and supports PromQL queries.

Metrics are collected by scraping (pulling) the /metrics endpoint from your application.

mermaid

graph LR
  Grafana --> Mimir
  Mimir --GET /metrics--> Application

Query metrics in Grafana Explore

Grafana ¶

Grafana is a tool for visualizing metrics. It is used to create dashboards that can be used to monitor your application. Grafana is used by many open source projects and is the de facto standard for metrics in the cloud native world.

Access Grafana here

Logs ¶

Logs are a way to understand what is happening in your application. They are usually text-based and are often used for debugging. Since the format of logs is usually not standardized, it can be difficult to query and aggregate logs and thus we recommend using metrics for dashboards and alerting.

Logs that are sent to console (stdout) are collected automatically and can be configured for persistent storage and querying in several ways.

mermaid

graph LR
  Application --stdout/stderr--> Router
  Router --> A[Grafana Loki]
  Router --> B[Team Logs]

💡 Learn more about logs

Traces ¶

With tracing, you get application performance monitoring (APM). Tracing gives deep insight into request execution: you can see parallel calls, time spent in each function, and dependencies between services.

Traces from Nais applications are collected using the OpenTelemetry standard and stored in Tempo. The Nais APM app provides service inventory, RED dashboards, dependency maps, and cross-signal navigation — no manual queries needed.

mermaid

graph LR
  Application --gRPC--> Tempo
  Tempo --> Grafana
  Tempo --> APM[Nais APM]

💡 Learn more about tracing

Alerts ¶

Alerts are a way to notify you when something is wrong with your application, and are usually triggered when a metric or log entry matches a certain condition.

Alerts in Nais are based on application metrics and use Prometheus Alertmanager to send notifications to Slack.

mermaid

graph LR
  alerts.yaml --> Mimir
  Mimir --> Alertmanager
  Alertmanager --> Slack

💡 Learn more about alerts

Learning more ¶

Observability is a very broad topic and there is a lot more to learn. Here are some resources that you can use to learn more about observability:

Monitoring, the Prometheus Way

SRE Book - Monitoring distributed systems

SRE Workbook - Monitoring

SRE Workbook - Alerting