Infrastructure

Monitoring and Observability for SaaS

Logs, Metrics, Traces — how to properly monitor your SaaS application. A practical comparison of self-hosted vs managed solutions for the DACH region.

Christoph Dietrich2026-04-2912 min read

Monitoring and Observability for SaaS

Why monitoring is not optional

Every SaaS application running in production needs observability. It is not a question of "if" but "how". Without monitoring, you are flying blind: you learn about outages from customer emails instead of alerts. You debug problems with console.log instead of structured traces.

In the DACH region, there is an additional dimension: if you capture personal data in logs or traces, GDPR requirements apply. This significantly influences tool selection.

5.6x

Faster incident resolution

With structured observability vs. log searching

99.9%

Uptime expectation

8.7h downtime per year maximum

€340k

Cost per hour of downtime

Average for mid-size SaaS companies

The three pillars of observability

1. Logs — What happened?

Logs are the foundation. Every application produces them. The question is how you collect, structure, and make them searchable.

Structured logs are mandatory. JSON instead of plaintext. Request IDs in every log entry. Use severity levels consistently.

// Bad
console.log("User logged in")

// Good
logger.info({ userId: "usr_123", action: "login", ip: "redacted", rid: "req_abc" })

Tools: ELK Stack (Elasticsearch, Logstash, Kibana), Grafana Loki, Datadog Logs

2. Metrics — How is the system performing?

Metrics are numeric time series: CPU usage, response times, error rates, queue lengths. They show trends and enable alerting.

The four golden signals (per Google SRE):

  • Latency — How long do requests take?
  • Traffic — How many requests are coming in?
  • Errors — How many requests fail?
  • Saturation — How loaded is the system?

Tools: Prometheus + Grafana, Datadog Metrics, CloudWatch

3. Traces — Where is the bottleneck?

Distributed tracing follows a request through all services. Essential for microservices, but also valuable for monoliths with external APIs.

Tools: Jaeger, Grafana Tempo, Datadog APM, OpenTelemetry (standard)

Self-hosted vs managed: The comparison

Self-Hosted (Grafana + Prometheus + Loki)

Advantages

  • Full control over data — GDPR-compliant on your own servers
  • No ongoing license costs, only infrastructure
  • No vendor dependency, open-source stack
  • Customizable to specific requirements

Disadvantages

  • Setup and maintenance require DevOps expertise
  • Scaling the monitoring infrastructure is your responsibility
  • No SLA — if Grafana goes down, you fix it yourself
  • Initial time investment: 2-4 weeks for production-ready setup

Managed (Datadog / New Relic / Grafana Cloud)

Advantages

  • Ready to use immediately, no infrastructure management
  • Integrated dashboards, alerting, APM from one provider
  • Automatic scaling of the monitoring platform
  • Enterprise support and SLAs available

Disadvantages

  • High costs with growing data volume
  • Data stored with the provider — GDPR review required
  • Vendor lock-in with proprietary agents and queries
  • Datadog bill often becomes the second-largest cloud expense

Cost comparison

Costs vary significantly depending on data volume. Here is a realistic scenario for a SaaS with 5,000-10,000 active users:

Monthly monitoring costs (5k-10k users)

Self-Hosted (Hetzner CX41 + Storage)~€65/mo
Grafana Cloud (Pro)~€180/mo
Datadog (Pro, 10 Hosts)~€450/mo
New Relic (Pro)~€350/mo

Important: With Datadog and New Relic, costs explode with the number of hosts and log volume. A team starting with 10 hosts can easily pay five times as much at 50 hosts.

Error tracking: Sentry

Sentry deserves its own mention. It is not a complete observability tool, but the best error tracking on the market.

What Sentry does:

  • Automatic grouping of similar errors
  • Source maps for frontend errors
  • Performance monitoring (transactions)
  • Release tracking (which deployment introduced the bug?)
  • Cron monitoring (job supervision)

Cost: Free up to 5k events/month. Team plan from $26/month. Sufficient for most startups on the free tier.

GDPR: Sentry offers EU data residency (Frankfurt) on the Business plan. Alternatively: self-hosted Sentry (requires ~4 GB RAM).

Uptime monitoring

External uptime checks are the simplest insurance. They verify from outside whether your application is reachable.

Recommendations:

  • Better Stack (formerly Better Uptime) — Status pages + alerting, EU checks available. From $24/mo.
  • Checkly — Synthetic monitoring + API checks with Playwright. Built by a Berlin-based team. From $30/mo.
  • UptimeRobot — Simple and affordable. Free tier with 50 monitors.

Alerting: Less is more

The biggest problem with monitoring is not too little alerting, but too much. Alert fatigue leads to critical notifications being ignored.

Alerting strategy:

  1. P1 (Immediate, phone/PagerDuty): Service down, database unreachable, error rate > 10%
  2. P2 (Slack, within 1h): High latency, disk > 80%, certificate expiry < 7 days
  3. P3 (Email, next business day): Elevated error rate, slow queries, dependencies degraded

Rule: If an alert has not led to an action in 30 days, delete or silence it.

Implementation order

Not everything at once. Build observability incrementally:

Recommended implementation order

Week 1-2

Structured logs + uptime monitoring

JSON logging, request IDs, set up UptimeRobot or Better Stack

Week 3-4

Error tracking with Sentry

Frontend + backend integration, source maps, release tracking

Month 2

Metrics + dashboards

Prometheus/Grafana or managed solution, the four golden signals

Month 3-4

Distributed tracing

OpenTelemetry integration, Jaeger or Grafana Tempo

Month 5+

Alerting optimization + runbooks

Refine alert rules, document incident response

GDPR and monitoring

Logs almost always contain personal data: IP addresses, user IDs, emails in error messages. This means:

  • Limit log retention (30-90 days is sufficient for most cases)
  • Reduce PII — anonymize IP addresses, do not write emails to logs
  • Sign a DPA when using managed tools with EU data
  • Prefer self-hosted when maximum data control is required

Conclusion: Start pragmatically

For SaaS startups in the DACH region, I recommend this stack:

  1. Sentry (Free/Team) for error tracking
  2. Better Stack or Checkly for uptime monitoring
  3. Grafana + Prometheus on Hetzner for metrics (self-hosted)
  4. Grafana Loki for logs (self-hosted or Grafana Cloud)

If you do not want to handle the initial setup yourself, monitoring can be built as part of a subscription development model — step by step, without disrupting ongoing operations.

Total cost: under €100/month for a solid setup. That is a fraction of what one hour of unplanned downtime costs.


Related Topics

Ready to get started?

Book a free intro call and see how we can help.

Book a Call

We're hiring Senior Engineers

100% Remote, DACH

We respect your privacy

This website uses cookies for essential functions and optionally for analytics and marketing. Privacy Policy