What is Platform Engineering?
The Bottleneck You Already Know About
Every growing engineering org has one: the person (or small team) who writes the CI pipelines, provisions the infrastructure, debugs the Kubernetes nodes, and answers Slack questions at 11pm about why staging is down. They're the "DevOps person." And they're a bottleneck.
It's not their fault. The model itself doesn't scale. When five product teams depend on one person to unblock their deploys, you get queues, context switching, and a bus factor of one. The rest of the org moves at the speed of that person's calendar.
Platform engineering is the industry's answer to this. The one-liner: platform engineering is the discipline of building and running internal developer platforms (IDPs) that let product teams self-serve their infrastructure needs. Instead of doing DevOps tasks for others, you build a product that lets others do it themselves.
The Problem: DevOps at Scale Breaks Down
The promise of DevOps was "you build it, you run it." In practice, most companies end up with one of two outcomes:
- DevOps-as-a-team — a dedicated team that handles all infra, pipelines, and operations. Product engineers throw requests over the wall. This is just ops with a modern job title.
- DevOps-as-a-culture — every engineer owns their infra. In theory, beautiful. In practice, your frontend engineer is now debugging why an EKS node went
NotReadyand won't recover, instead of shipping the feature they were hired to build.
Both break down past a certain scale. The first creates a bottleneck. The second creates cognitive overload.
I've seen this firsthand. We had different Helm charts for every application — slightly different values files, different templating approaches, different assumptions about health checks. Then there were the shared services: Kafka, nginx ingress, cert-manager — each managed by whoever set it up originally, with tribal knowledge as the only documentation. A new engineer joining the team needed weeks just to understand how to deploy their first service.
That's the developer experience tax. Context switching, waiting on infra tickets, inconsistent environments, and the nagging feeling that you're one bad kubectl apply away from a production incident.
The cognitive load problem is real: product engineers shouldn't need to understand Terraform state management or EKS networking internals to ship a feature. They should be able to deploy, scale, and observe their service without filing a ticket.
What Platform Engineering Actually Is
The shift is subtle but important. Instead of doing infra work for other teams, platform engineers build a product that lets other teams do it themselves. The customers aren't external users — they're your fellow engineers.
The core of that product is the Internal Developer Platform (IDP). It's not one tool. It's the combination of tooling, workflows, and abstractions that a platform team builds and maintains. A few key concepts:
- Golden paths (paved roads) — opinionated, pre-approved ways to do common things. Want to deploy a new microservice? Here's the blessed template with CI/CD, observability, and security baked in. You can go off-road, but the golden path gets you there faster and safer.
- Self-service infrastructure — deploy, scale, and provision resources without a Jira ticket. A developer should be able to spin up a database, create a new service, or configure a CDN through a CLI, UI, or config file — not a Slack message to the platform team.
- Developer portal — a single pane of glass for service ownership, documentation, API specs, and operational status. Backstage (originally from Spotify, now a CNCF project) has become the de facto standard here.
- Abstractions over raw cloud — you shouldn't need to understand VPC peering or IAM role chaining to deploy a service. The platform team encodes those best practices into reusable modules, and product teams consume the abstraction.
The key word is product. Platform teams that succeed treat their platform like a product: they have users, they collect feedback, they iterate, they measure adoption. Platform teams that fail build tools nobody asked for and mandate their use.
How It Differs from DevOps and SRE
These three get conflated constantly. Here's how I think about them:
| DevOps | SRE | Platform Engineering | |
|---|---|---|---|
| Focus | Culture + practices for faster delivery | Reliability, SLOs, incident response | Building the platform other engineers use |
| Who they serve | The whole org (cultural shift) | Production systems | Internal engineering teams (as customers) |
| Primary output | Practices, automation, collaboration | Error budgets, runbooks, incident tooling | Self-service tools, golden paths, IDPs |
| Origin | Industry movement (~2009) | Google (~2003) | Evolution of DevOps at scale (~2020) |
| Mindset | "Break down silos" | "Manage reliability as an engineering problem" | "Treat infra tooling as a product" |
These aren't mutually exclusive. SRE and platform engineering often coexist — SRE focuses on keeping production reliable, platform engineering focuses on making developers productive. DevOps is the cultural foundation both build on.
The key differentiator: platform engineers treat internal teams as customers. They do user research. They track adoption metrics. They deprecate features with migration paths. It's product management applied to infrastructure.
Core Components of a Platform
A mature platform engineering setup typically covers these areas:
- Infrastructure abstraction layer — Terraform modules, Crossplane compositions, or Helm charts that encode best practices. Instead of every team writing their own RDS Terraform config, one module exists with sane defaults for security groups, backups, parameter groups, and monitoring. Teams consume it with a few input variables.
- CI/CD as a service — standardized pipelines teams can adopt, not build from scratch. A new service should get CI/CD by adding a config file, not by spending a sprint configuring GitHub Actions from scratch.
- Observability defaults — logging, metrics, and tracing baked in from day one. When you deploy via the golden path, your service automatically emits structured logs, exposes Prometheus metrics, and propagates trace context. No bolting it on after the first outage.
- Secrets and config management — Vault, AWS Secrets Manager, External Secrets Operator. The platform handles rotation, access policies, and injection into workloads. Developers reference a secret name; they never see the raw value.
- Developer portal / service catalog — Backstage, Port, or Cortex. Who owns this service? Where are the docs? What's the on-call rotation? What APIs does it expose? One place to answer all of that.
- Environment provisioning — preview environments spun up per PR, ephemeral staging environments for integration testing, one-click teardown when done. No more "staging is broken because three teams are deploying to it simultaneously."
The Tooling Landscape
This is opinionated and not exhaustive — these are the tools I see most often in platform engineering contexts:
- IDP / Portal: Backstage (open source, CNCF-backed) is the most common. Port is a strong managed alternative.
- Infrastructure as Code: Terraform remains dominant. Crossplane is gaining traction for teams that want Kubernetes-native IaC — you define infrastructure as custom resources, and Crossplane reconciles them.
- GitOps / Delivery: ArgoCD and Flux. Define your desired state in Git, and the cluster converges to match it.
- Container orchestration: Kubernetes (EKS, GKE, or self-managed). Love it or hate it, it's the runtime most platforms standardize on.
- Observability: Prometheus + Grafana for metrics, OpenTelemetry for traces and logs. The OTel ecosystem is converging fast — if you're starting fresh, start here.
One thing worth emphasizing: tools follow the platform, not the other way around. Don't pick Backstage and then try to build a platform around it. Identify what your developers need, build the simplest thing that solves it, and then choose the tools that fit.
How to Start Small
You don't need a dedicated 10-person platform team to get started. Most platform engineering efforts begin with one or two engineers who are tired of solving the same problems repeatedly.
Step 0: Find the biggest pain point. Talk to your product engineers. Where do they lose the most time? Is it deployment? Environment setup? Debugging production issues without observability? That's where you start.
Practical starting points:
- Standardize one Terraform module. Pick the most commonly provisioned resource — an RDS instance, an ECS service, an S3 bucket — and create a single, well-documented module with sane defaults. Have every team use it instead of copy-pasting their own.
- Build one golden path. Create an opinionated GitHub Actions workflow (or whatever your CI tool is) that handles build, test, deploy, and rollback. Teams fork it and customize the parts they need. The 80% path is covered.
- Set up a basic service catalog. This doesn't have to be Backstage. A Notion page or Confluence doc that lists every service, its owner, its repo, its deploy process, and its runbook is better than nothing. You can migrate to Backstage later when the pain justifies the investment.
- Standardize your Helm charts. If you're on Kubernetes, having a shared base chart for application services and separate charts for infrastructure dependencies (Kafka, nginx, Redis) eliminates the "every app has a slightly different deployment" problem.
The goal is reducing cognitive load, not adding another tool to the stack.
Is Platform Engineering Right for Your Team?
Signals you need it:
- More than 3 teams sharing one "DevOps person" as a dependency
- New engineers take weeks to ship their first feature because of infra complexity
- You have tribal knowledge that lives in one person's head
- Teams are reinventing the same CI/CD patterns independently
- Production incidents caused by inconsistent infrastructure configurations
Signals you don't (yet):
- Single team, single product, moving fast — platform engineering is premature optimization at this stage
- You haven't outgrown a single engineer managing infra part-time
- Your deployment process works and nobody's complaining
There's a maturity curve to this: ad-hoc scripts → tribal knowledge → standardized modules → self-service tooling → fully productized platform. Most teams are somewhere in the middle. The goal isn't to jump to the end — it's to take the next step that reduces friction for your specific team.
The Mindset Shift
Platform engineering isn't a role title you slap on your existing DevOps team. It's a mindset shift: treat infrastructure and developer tooling as a product, with internal engineers as your customers. Measure success not by how many Terraform modules you wrote, but by how many teams can ship without waiting on you.
It's the discipline that scales DevOps practices without linearly scaling headcount. And for most growing engineering orgs, it's not a question of if — it's when.
I'm actively exploring this space — building the skills, experimenting with the tooling, and writing about what I learn along the way. If you're on a similar path or have opinions on any of this, reach out on LinkedIn or by email.