Cloud & DevOps as a Subscription
Why DevOps matters for your product
Code that is not deployed is worthless. The fastest development team in the world is bottlenecked by slow deployments, flaky pipelines, and manual infrastructure management. DevOps is not a role — it is a culture that eliminates the gap between writing code and delivering value.
What good DevOps gives you:
- Deploy in minutes, not days: Push to main, code is live. No manual steps, no "deployment Fridays"
- Catch bugs before users do: Automated tests, staging environments, canary deployments
- Scale without panic: Infrastructure scales automatically based on demand
- Recover instantly: Rollback to the previous version in seconds, not hours
- Control costs: Pay for what you use, not what you might need
See how this fits into our development process: Development as a Subscription Guide
CI/CD pipelines with GitHub Actions
Every project we deliver includes a CI/CD pipeline. No exceptions. The pipeline is the quality gate that prevents broken code from reaching production.
Our standard pipeline
Push → Lint → Type Check → Unit Tests → Integration Tests → Build → Deploy to Staging → E2E Tests → Deploy to Production
Each stage must pass before the next one runs. A failing lint check blocks deployment just like a failing test.
Pipeline stages in detail
| Stage | Tool | Duration | What it catches |
|---|---|---|---|
| Lint | ESLint | 10-30s | Code style, unused imports, anti-patterns |
| Type Check | TypeScript | 15-60s | Type errors, missing properties, wrong arguments |
| Unit Tests | Vitest / Jest | 30-120s | Business logic bugs, edge cases |
| Integration Tests | Supertest | 60-180s | API contract violations, database issues |
| Build | Docker | 60-300s | Build errors, missing dependencies |
| E2E Tests | Playwright | 120-600s | User flow regressions, cross-browser issues |
| Deploy | Terraform / Helm | 60-300s | Infrastructure drift, configuration errors |
Advanced pipeline features
- Parallel jobs: Lint, type check, and unit tests run simultaneously
- Dependency caching: npm, Docker layers, and build artifacts are cached between runs
- Matrix builds: Test against multiple Node.js versions and operating systems
- Conditional deployment: Only deploy when tests pass on the target branch
- Slack notifications: Team gets notified on failure, not on success (no alert fatigue)
- Cost reports: Infrastructure cost changes are commented on the PR before merging
Docker: consistent environments everywhere
"It works on my machine" is not a valid deployment strategy. Docker ensures that your application runs identically in development, staging, and production.
Multi-stage Dockerfiles
We build Docker images with multi-stage builds to minimize image size and attack surface:
Stage 1: Build — Install dependencies, compile TypeScript, run tests Stage 2: Production — Copy only the compiled output and production dependencies
Result: Images that are 50-80% smaller than naive builds. A typical NestJS application produces a ~150MB image instead of 600MB+.
Docker best practices we follow
| Practice | Why |
|---|---|
| Non-root user | Security: container processes should never run as root |
.dockerignore | Performance: exclude node_modules, .git, test files |
| Layer ordering | Cache: dependencies before source code (cache npm install) |
| Health checks | Reliability: Kubernetes knows when a container is unhealthy |
| Distroless base images | Security: minimal attack surface, no shell |
| Fixed versions | Reproducibility: node:22.15-slim, not node:latest |
| Build arguments | Flexibility: environment-specific configuration at build time |
Kubernetes: orchestration at scale
For applications that need horizontal scaling, zero-downtime deployments, and self-healing, Kubernetes is the industry standard.
When you need Kubernetes
| Scenario | Kubernetes? |
|---|---|
| Single application, low traffic | No — use a managed platform (Vercel, Railway, Fly.io) |
| 2-5 services, moderate traffic | Maybe — Docker Compose or managed Kubernetes (EKS/GKE) |
| 5+ services, variable traffic | Yes — Kubernetes with auto-scaling |
| Strict compliance requirements | Yes — self-managed Kubernetes for full control |
| Global distribution | Yes — multi-region Kubernetes clusters |
Our Kubernetes stack
| Component | Tool | Purpose |
|---|---|---|
| Cluster management | EKS (AWS) / GKE (GCP) | Managed control plane |
| Ingress | Nginx Ingress / Traefik | HTTP routing, TLS termination |
| Service mesh | Istio / Linkerd (when needed) | mTLS, traffic management, observability |
| Secrets | External Secrets Operator | Sync secrets from AWS SSM / GCP Secret Manager |
| Cert management | cert-manager | Automatic TLS certificate provisioning |
| Autoscaling | HPA + Karpenter | Pod and node autoscaling |
| Storage | EBS CSI / Persistent Volumes | Stateful workloads |
Deployment strategies
| Strategy | How it works | Risk | Rollback speed |
|---|---|---|---|
| Rolling update | Replace pods gradually | Low | Minutes |
| Blue/Green | Run two versions, switch traffic | Very low | Seconds |
| Canary | Route 5% of traffic to new version, monitor, then expand | Very low | Seconds |
| Feature flags | Deploy code but control activation | None | Instant |
We default to rolling updates for most services and canary deployments for user-facing applications. Feature flags (LaunchDarkly or custom) complement deployment strategies by decoupling code deployment from feature release.
Infrastructure as Code with Terraform
Manual infrastructure changes are the leading cause of outages. Every piece of infrastructure we manage is defined in code, version-controlled, and applied through CI/CD.
Why Terraform?
| Feature | Terraform | CloudFormation | Pulumi |
|---|---|---|---|
| Multi-cloud | Yes | AWS only | Yes |
| Language | HCL (declarative) | YAML/JSON | TypeScript/Python |
| State management | Remote (S3, GCS) | Managed by AWS | Managed or self-hosted |
| Ecosystem | Largest provider ecosystem | AWS only | Growing |
| Drift detection | Built-in | Built-in | Built-in |
| Learning curve | Moderate | Moderate | Low (if you know TS) |
We use Terraform for multi-cloud and cross-service infrastructure. For pure AWS shops, CloudFormation or CDK can also work.
Infrastructure we manage with Terraform
- Networking: VPCs, subnets, security groups, NAT gateways, load balancers
- Compute: ECS/EKS clusters, auto-scaling groups, Lambda functions
- Databases: RDS (PostgreSQL), ElastiCache (Redis), DynamoDB
- Storage: S3 buckets, CloudFront distributions
- DNS: Route 53 / Cloud DNS records
- Monitoring: CloudWatch alarms, SNS topics, PagerDuty integrations
- Security: IAM roles, KMS keys, WAF rules
Terraform workflow
- Developer creates a PR with infrastructure changes
terraform planruns automatically and posts the diff as a PR comment- Team reviews the plan (what will be created, changed, or destroyed)
- After approval and merge,
terraform applyexecutes the changes - State is stored remotely with locking to prevent concurrent modifications
AWS vs GCP: choosing your cloud
We work with both AWS and GCP. Here is an honest comparison to help you decide.
| Factor | AWS | GCP |
|---|---|---|
| Market share | ~32% (largest) | ~12% (third) |
| Service breadth | Most services (200+) | Fewer but well-designed |
| Kubernetes | EKS (good) | GKE (best managed K8s) |
| Serverless | Lambda (mature) | Cloud Run (simpler) |
| Database | RDS, Aurora, DynamoDB | Cloud SQL, Spanner, Firestore |
| AI/ML | SageMaker, Bedrock | Vertex AI (stronger) |
| Pricing | Complex, many knobs | Simpler, sustained discounts |
| Enterprise adoption | Dominant | Growing fast |
| Developer experience | Complex console | Cleaner console |
Our recommendation: AWS if you need the broadest service catalog or your enterprise already uses AWS. GCP if you want simpler managed services, better Kubernetes, or stronger AI/ML integration. Both are excellent choices.
Monitoring and observability
Deploying your application is only half the job. Knowing what happens after deployment is the other half.
The three pillars of observability
| Pillar | Tool | What it shows |
|---|---|---|
| Metrics | Prometheus + Grafana | System health: CPU, memory, response times, error rates |
| Logs | Loki / CloudWatch Logs | What happened: request details, errors, audit trails |
| Traces | Jaeger / AWS X-Ray | How long each step took: database queries, external calls, queue processing |
Alerting philosophy
- Alert on symptoms, not causes: Alert when response time exceeds 500ms, not when CPU hits 80%
- Actionable alerts only: Every alert must have a clear response action. If you cannot act on it, it is a metric, not an alert
- Severity levels: Critical (wake someone up) vs Warning (handle during business hours)
- Runbooks: Every alert links to a runbook with diagnosis steps and remediation
Key metrics we monitor
| Metric | Target | Alert threshold |
|---|---|---|
| Response time (p95) | < 200ms | > 500ms |
| Error rate (5xx) | < 0.1% | > 1% |
| Availability | 99.9%+ | < 99.5% in 5min window |
| Deployment frequency | Daily | N/A |
| Mean time to recovery | < 15min | > 30min |
| Container restarts | 0 | > 3 in 10min |
Cost optimization
Cloud costs spiral out of control without active management. We build cost awareness into every infrastructure decision.
Cost optimization strategies
- Right-sizing: Match instance types to actual workload (most services are over-provisioned by 2-4x)
- Reserved instances / Savings Plans: 30-60% savings for predictable workloads
- Spot instances: 60-90% savings for fault-tolerant workloads (batch processing, CI runners)
- Auto-scaling: Scale down during off-hours, scale up during peak
- Storage tiering: Move infrequently accessed data to cheaper storage classes (S3 Glacier, Coldline)
- CDN caching: Reduce origin traffic and bandwidth costs
- Database optimization: Connection pooling, query optimization, read replicas where appropriate
Cost monitoring
- Monthly cost reports with trend analysis
- Budget alerts before costs exceed thresholds
- Per-service cost allocation (tags on every resource)
- Cost impact analysis on infrastructure PRs
How we work on your infrastructure
1. Infrastructure audit
We assess your current setup: what is over-provisioned, what is at risk, what is manually configured. Output: a prioritized remediation plan.
2. IaC migration
Existing manual infrastructure is codified in Terraform. We import existing resources without downtime.
3. CI/CD implementation
Pipeline setup for every application: build, test, deploy. Staging and production environments with promotion gates.
4. Monitoring setup
Metrics, logs, traces, alerting, and dashboards. You know the health of your system at a glance.
5. Ongoing optimization
Monthly cost reviews, security patching, dependency updates, and infrastructure improvements.
Wondering about the cost model? Compare: Freelancer vs Agency vs Subscription
Common questions about Cloud & DevOps
Do I need Kubernetes?
Most startups do not. A managed platform (Vercel for Next.js, Railway for Node.js, or ECS Fargate for Docker) is sufficient until you have 5+ services or need advanced deployment strategies. We start simple and add complexity only when the scale demands it.
How much should I spend on cloud infrastructure?
For a typical SaaS startup: €200-500/month for early stage, €1,000-3,000/month for growth stage, €5,000-20,000/month for scale stage. These numbers vary wildly based on traffic, data volume, and compliance requirements.
Can you manage infrastructure I already have?
Yes. We audit, document, codify, and optimize existing infrastructure. No need to start from scratch.
What does Cloud & DevOps as a subscription cost?
Infrastructure work is included in every plan. For dedicated DevOps focus, we recommend Growth 150 (€5,995/month) or higher. Compare: True Cost of a Developer.
Related services
- Node.js Backend: Applications we deploy and scale
- Next.js Development: Vercel or self-hosted deployment
- React Development: Static hosting and CDN deployment
- API & Integration Development: API infrastructure and monitoring
- TypeScript: Build tooling and monorepo CI/CD
- Mobile Development: CI/CD for iOS and Android builds
Kostenrechner
Vergleich: proreactware vs. vergleichbare interne Kapazität
3 Items gleichzeitig
~2.5 Entwickler intern
€30.000
pro Monat (Gehalt + AG + Tools + Büro)
Advanced 300
€9.995
pro Monat (fix, kein Recruiting/Onboarding)
Ersparnis: €20.005/Monat (67%)
€240.060/Jahr, plus eingesparte Recruiting-Kosten (~€15.000 pro Stelle)
Kalkulation basiert auf Ø €12.000 Gesamtkosten/Monat pro Senior-Entwickler in Deutschland (€8.000 Gehalt + ~21% AG-Anteile + Tools + anteilig Recruiting/Onboarding/Büro). Tatsaechliche Kosten variieren je nach Standort und Seniorität.