Infrastructure

Monitoring and Observability for SaaS

Logs, Metrics, Traces — how to properly monitor your SaaS application. A practical comparison of self-hosted vs managed solutions for the DACH region.

Christoph Dietrich2026-04-2912 min read

Why monitoring is not optional

Every SaaS application running in production needs observability. It is not a question of "if" but "how". Without monitoring, you are flying blind: you learn about outages from customer emails instead of alerts. You debug problems with console.log instead of structured traces.

In the DACH region, there is an additional dimension: if you capture personal data in logs or traces, GDPR requirements apply. This significantly influences tool selection.

5.6x

Faster incident resolution

With structured observability vs. log searching

99.9%

Uptime expectation

8.7h downtime per year maximum

€340k

Cost per hour of downtime

Average for mid-size SaaS companies

The three pillars of observability

1. Logs — What happened?

Logs are the foundation. Every application produces them. The question is how you collect, structure, and make them searchable.

Structured logs are mandatory. JSON instead of plaintext. Request IDs in every log entry. Use severity levels consistently.

// Bad
console.log("User logged in")

// Good
logger.info({ userId: "usr_123", action: "login", ip: "redacted", rid: "req_abc" })

Tools: ELK Stack (Elasticsearch, Logstash, Kibana), Grafana Loki, Datadog Logs

2. Metrics — How is the system performing?

Metrics are numeric time series: CPU usage, response times, error rates, queue lengths. They show trends and enable alerting.

The four golden signals (per Google SRE):

Latency — How long do requests take?
Traffic — How many requests are coming in?
Errors — How many requests fail?
Saturation — How loaded is the system?

Tools: Prometheus + Grafana, Datadog Metrics, CloudWatch

3. Traces — Where is the bottleneck?

Distributed tracing follows a request through all services. Essential for microservices, but also valuable for monoliths with external APIs.

Tools: Jaeger, Grafana Tempo, Datadog APM, OpenTelemetry (standard)

Self-hosted vs managed: The comparison

Self-Hosted (Grafana + Prometheus + Loki)

Advantages

Full control over data — GDPR-compliant on your own servers
No ongoing license costs, only infrastructure
No vendor dependency, open-source stack
Customizable to specific requirements

Disadvantages

Setup and maintenance require DevOps expertise
Scaling the monitoring infrastructure is your responsibility
No SLA — if Grafana goes down, you fix it yourself
Initial time investment: 2-4 weeks for production-ready setup

Managed (Datadog / New Relic / Grafana Cloud)

Advantages

Ready to use immediately, no infrastructure management
Integrated dashboards, alerting, APM from one provider
Automatic scaling of the monitoring platform
Enterprise support and SLAs available

Disadvantages

High costs with growing data volume
Data stored with the provider — GDPR review required
Vendor lock-in with proprietary agents and queries
Datadog bill often becomes the second-largest cloud expense

Cost comparison

Costs vary significantly depending on data volume. Here is a realistic scenario for a SaaS with 5,000-10,000 active users:

Monthly monitoring costs (5k-10k users)

Self-Hosted (Hetzner CX41 + Storage)~€65/mo

Grafana Cloud (Pro)~€180/mo

Datadog (Pro, 10 Hosts)~€450/mo

New Relic (Pro)~€350/mo

Important: With Datadog and New Relic, costs explode with the number of hosts and log volume. A team starting with 10 hosts can easily pay five times as much at 50 hosts.

Error tracking: Sentry

Sentry deserves its own mention. It is not a complete observability tool, but the best error tracking on the market.

What Sentry does:

Automatic grouping of similar errors
Source maps for frontend errors
Performance monitoring (transactions)
Release tracking (which deployment introduced the bug?)
Cron monitoring (job supervision)

Cost: Free up to 5k events/month. Team plan from $26/month. Sufficient for most startups on the free tier.

GDPR: Sentry offers EU data residency (Frankfurt) on the Business plan. Alternatively: self-hosted Sentry (requires ~4 GB RAM).

Uptime monitoring

External uptime checks are the simplest insurance. They verify from outside whether your application is reachable.

Recommendations:

Better Stack (formerly Better Uptime) — Status pages + alerting, EU checks available. From $24/mo.
Checkly — Synthetic monitoring + API checks with Playwright. Built by a Berlin-based team. From $30/mo.
UptimeRobot — Simple and affordable. Free tier with 50 monitors.

Alerting: Less is more

The biggest problem with monitoring is not too little alerting, but too much. Alert fatigue leads to critical notifications being ignored.

Alerting strategy:

P1 (Immediate, phone/PagerDuty): Service down, database unreachable, error rate > 10%
P2 (Slack, within 1h): High latency, disk > 80%, certificate expiry < 7 days
P3 (Email, next business day): Elevated error rate, slow queries, dependencies degraded

Rule: If an alert has not led to an action in 30 days, delete or silence it.

Implementation order

Not everything at once. Build observability incrementally:

Recommended implementation order

Week 1-2

Structured logs + uptime monitoring

JSON logging, request IDs, set up UptimeRobot or Better Stack

Week 3-4

Error tracking with Sentry

Frontend + backend integration, source maps, release tracking

Month 2

Metrics + dashboards

Prometheus/Grafana or managed solution, the four golden signals

Month 3-4

Distributed tracing

OpenTelemetry integration, Jaeger or Grafana Tempo

Month 5+

Alerting optimization + runbooks

Refine alert rules, document incident response

GDPR and monitoring

Logs almost always contain personal data: IP addresses, user IDs, emails in error messages. This means:

Limit log retention (30-90 days is sufficient for most cases)
Reduce PII — anonymize IP addresses, do not write emails to logs
Sign a DPA when using managed tools with EU data
Prefer self-hosted when maximum data control is required

Conclusion: Start pragmatically

For SaaS startups in the DACH region, I recommend this stack:

Sentry (Free/Team) for error tracking
Better Stack or Checkly for uptime monitoring
Grafana + Prometheus on Hetzner for metrics (self-hosted)
Grafana Loki for logs (self-hosted or Grafana Cloud)

If you do not want to handle the initial setup yourself, monitoring can be built as part of a subscription development model — step by step, without disrupting ongoing operations.

Total cost: under €100/month for a solid setup. That is a fraction of what one hour of unplanned downtime costs.