Monitoring and Observability for SaaS
Logs, Metrics, Traces — how to properly monitor your SaaS application. A practical comparison of self-hosted vs managed solutions for the DACH region.
Why monitoring is not optional
Every SaaS application running in production needs observability. It is not a question of "if" but "how". Without monitoring, you are flying blind: you learn about outages from customer emails instead of alerts. You debug problems with console.log instead of structured traces.
In the DACH region, there is an additional dimension: if you capture personal data in logs or traces, GDPR requirements apply. This significantly influences tool selection.
5.6x
Faster incident resolution
With structured observability vs. log searching
99.9%
Uptime expectation
8.7h downtime per year maximum
€340k
Cost per hour of downtime
Average for mid-size SaaS companies
The three pillars of observability
1. Logs — What happened?
Logs are the foundation. Every application produces them. The question is how you collect, structure, and make them searchable.
Structured logs are mandatory. JSON instead of plaintext. Request IDs in every log entry. Use severity levels consistently.
// Bad
console.log("User logged in")
// Good
logger.info({ userId: "usr_123", action: "login", ip: "redacted", rid: "req_abc" })
Tools: ELK Stack (Elasticsearch, Logstash, Kibana), Grafana Loki, Datadog Logs
2. Metrics — How is the system performing?
Metrics are numeric time series: CPU usage, response times, error rates, queue lengths. They show trends and enable alerting.
The four golden signals (per Google SRE):
- Latency — How long do requests take?
- Traffic — How many requests are coming in?
- Errors — How many requests fail?
- Saturation — How loaded is the system?
Tools: Prometheus + Grafana, Datadog Metrics, CloudWatch
3. Traces — Where is the bottleneck?
Distributed tracing follows a request through all services. Essential for microservices, but also valuable for monoliths with external APIs.
Tools: Jaeger, Grafana Tempo, Datadog APM, OpenTelemetry (standard)
Self-hosted vs managed: The comparison
Self-Hosted (Grafana + Prometheus + Loki)
Advantages
- Full control over data — GDPR-compliant on your own servers
- No ongoing license costs, only infrastructure
- No vendor dependency, open-source stack
- Customizable to specific requirements
Disadvantages
- Setup and maintenance require DevOps expertise
- Scaling the monitoring infrastructure is your responsibility
- No SLA — if Grafana goes down, you fix it yourself
- Initial time investment: 2-4 weeks for production-ready setup
Managed (Datadog / New Relic / Grafana Cloud)
Advantages
- Ready to use immediately, no infrastructure management
- Integrated dashboards, alerting, APM from one provider
- Automatic scaling of the monitoring platform
- Enterprise support and SLAs available
Disadvantages
- High costs with growing data volume
- Data stored with the provider — GDPR review required
- Vendor lock-in with proprietary agents and queries
- Datadog bill often becomes the second-largest cloud expense
Cost comparison
Costs vary significantly depending on data volume. Here is a realistic scenario for a SaaS with 5,000-10,000 active users:
Monthly monitoring costs (5k-10k users)
Important: With Datadog and New Relic, costs explode with the number of hosts and log volume. A team starting with 10 hosts can easily pay five times as much at 50 hosts.
Error tracking: Sentry
Sentry deserves its own mention. It is not a complete observability tool, but the best error tracking on the market.
What Sentry does:
- Automatic grouping of similar errors
- Source maps for frontend errors
- Performance monitoring (transactions)
- Release tracking (which deployment introduced the bug?)
- Cron monitoring (job supervision)
Cost: Free up to 5k events/month. Team plan from $26/month. Sufficient for most startups on the free tier.
GDPR: Sentry offers EU data residency (Frankfurt) on the Business plan. Alternatively: self-hosted Sentry (requires ~4 GB RAM).
Uptime monitoring
External uptime checks are the simplest insurance. They verify from outside whether your application is reachable.
Recommendations:
- Better Stack (formerly Better Uptime) — Status pages + alerting, EU checks available. From $24/mo.
- Checkly — Synthetic monitoring + API checks with Playwright. Built by a Berlin-based team. From $30/mo.
- UptimeRobot — Simple and affordable. Free tier with 50 monitors.
Alerting: Less is more
The biggest problem with monitoring is not too little alerting, but too much. Alert fatigue leads to critical notifications being ignored.
Alerting strategy:
- P1 (Immediate, phone/PagerDuty): Service down, database unreachable, error rate > 10%
- P2 (Slack, within 1h): High latency, disk > 80%, certificate expiry < 7 days
- P3 (Email, next business day): Elevated error rate, slow queries, dependencies degraded
Rule: If an alert has not led to an action in 30 days, delete or silence it.
Implementation order
Not everything at once. Build observability incrementally:
Recommended implementation order
Week 1-2
Structured logs + uptime monitoring
JSON logging, request IDs, set up UptimeRobot or Better Stack
Week 3-4
Error tracking with Sentry
Frontend + backend integration, source maps, release tracking
Month 2
Metrics + dashboards
Prometheus/Grafana or managed solution, the four golden signals
Month 3-4
Distributed tracing
OpenTelemetry integration, Jaeger or Grafana Tempo
Month 5+
Alerting optimization + runbooks
Refine alert rules, document incident response
GDPR and monitoring
Logs almost always contain personal data: IP addresses, user IDs, emails in error messages. This means:
- Limit log retention (30-90 days is sufficient for most cases)
- Reduce PII — anonymize IP addresses, do not write emails to logs
- Sign a DPA when using managed tools with EU data
- Prefer self-hosted when maximum data control is required
Conclusion: Start pragmatically
For SaaS startups in the DACH region, I recommend this stack:
- Sentry (Free/Team) for error tracking
- Better Stack or Checkly for uptime monitoring
- Grafana + Prometheus on Hetzner for metrics (self-hosted)
- Grafana Loki for logs (self-hosted or Grafana Cloud)
If you do not want to handle the initial setup yourself, monitoring can be built as part of a subscription development model — step by step, without disrupting ongoing operations.
Total cost: under €100/month for a solid setup. That is a fraction of what one hour of unplanned downtime costs.
Related Topics
We're hiring Senior Engineers
100% Remote, DACH