spinny:~/writing $ less platform-engineering-internal-developer-platform.md
12As software systems grow more complex - microservices, Kubernetes, multi-cloud, CI/CD pipelines, observability stacks - developers are spending more time on infrastructure and less time on building products. **Platform Engineering** solves this by creating an internal platform that abstracts away complexity and provides developers with self-service tools to ship faster.34Gartner predicts that by 2027, **80% of software engineering organizations** will establish platform teams. In this guide, we'll explore what Platform Engineering is, why it matters, and how to build an Internal Developer Platform from scratch.56## What is Platform Engineering?78Platform Engineering is the discipline of designing and building toolchains and workflows that enable self-service capabilities for software engineering organizations. The output is an **Internal Developer Platform (IDP)** - a layer that sits between developers and infrastructure.910```mermaid11graph TD12 subgraph "Developers"13 D1[Frontend Team]14 D2[Backend Team]15 D3[Data Team]16 end1718 subgraph "Internal Developer Platform"19 Portal[Developer Portal]20 Templates[Service Templates]21 CICD[CI/CD Pipelines]22 Infra[Infrastructure Abstraction]23 end2425 subgraph "Infrastructure"26 K8s[Kubernetes]27 Cloud[Cloud Services]28 DB[Databases]29 Monitor[Monitoring]30 end3132 D1 --> Portal33 D2 --> Portal34 D3 --> Portal35 Portal --> Templates36 Portal --> CICD37 Portal --> Infra38 Infra --> K8s39 Infra --> Cloud40 Infra --> DB41 CICD --> Monitor42```4344### Platform Engineering vs DevOps4546Platform Engineering is not a replacement for DevOps - it's the next evolution. DevOps said "you build it, you run it." Platform Engineering says "we'll make building and running it effortless."4748| Aspect | DevOps | Platform Engineering |49|--------|--------|---------------------|50| **Focus** | Culture and practices | Products and self-service |51| **Approach** | Every team manages infra | Platform team abstracts infra |52| **Cognitive Load** | High (each team learns everything) | Low (golden paths provided) |53| **Output** | CI/CD pipelines, IaC scripts | Internal Developer Platform |54| **Users** | All engineering | Platform team serves engineering |5556## Why Platform Engineering Matters5758### The Cognitive Load Problem5960In a typical modern organization, a developer needs to understand:6162- Git workflows and branching strategies63- CI/CD pipeline configuration64- Container building and registry management65- Kubernetes manifests and Helm charts66- Cloud provider services (AWS/GCP/Azure)67- Networking, DNS, TLS certificates68- Monitoring, logging, alerting setup69- Database provisioning and migrations70- Security policies and compliance7172That's an enormous cognitive burden that takes focus away from the actual product.7374### The Golden Path7576Platform Engineering introduces the concept of **golden paths** - opinionated, well-supported, and documented paths for common tasks. A golden path is not a mandate; developers *can* deviate, but the platform makes the right thing the easy thing.7778```mermaid79flowchart LR80 Dev[Developer] -- "Create new service" --> Portal[Portal]81 Portal -- "Select template" --> Template[Service Template]82 Template -- "Auto-provision" --> Repo[Git Repository]83 Template --> Pipeline[CI/CD Pipeline]84 Template --> Infra[Kubernetes Namespace]85 Template --> Monitor[Dashboards + Alerts]86 Repo --> Ready[Ready to Code!]87```8889**Example golden path for creating a new microservice:**901. Developer selects "New Backend Service" in the portal912. Chooses language/framework (Node.js, Go, Python)923. Platform auto-creates: Git repo, CI/CD pipeline, Kubernetes namespace, monitoring dashboards, and alert rules934. Developer clones the repo and starts coding immediately9495## Building an Internal Developer Platform9697### Layer 1: Developer Portal9899The portal is the single entry point for developers. The most popular open-source option is **Backstage** (created by Spotify, now a CNCF project).100101Key features:102- **Service catalog**: Every service, its owner, documentation, and dependencies103- **Software templates**: Scaffolding for new services with best practices built in104- **Tech docs**: Documentation as code, rendered and searchable105- **Plugin ecosystem**: Extend with custom functionality106107```yaml108# backstage/catalog-info.yaml109apiVersion: backstage.io/v1alpha1110kind: Component111metadata:112 name: user-service113 description: Manages user accounts and authentication114 annotations:115 github.com/project-slug: myorg/user-service116 backstage.io/techdocs-ref: dir:.117spec:118 type: service119 lifecycle: production120 owner: team-auth121 system: identity-platform122 dependsOn:123 - resource:postgresql-main124 providesApis:125 - user-api126```127128### Layer 2: Infrastructure Abstraction129130Developers shouldn't write Terraform or Kubernetes YAML directly. The platform should provide abstractions.131132**Tools:**133- **Crossplane**: Kubernetes-native infrastructure provisioning134- **Terraform with modules**: Pre-built, tested infrastructure modules135- **Pulumi**: Infrastructure as real code (TypeScript, Python, Go)136137```yaml138# Example: Crossplane composition for a database139apiVersion: database.example.com/v1140kind: PostgreSQLInstance141metadata:142 name: user-db143spec:144 size: small # Abstraction: small = 2 vCPU, 4GB RAM145 version: "16"146 backup: daily147 team: auth-team148```149150Instead of configuring RDS parameters, VPC subnets, security groups, and backup policies, the developer just specifies `size: small` and `backup: daily`. The platform handles the rest.151152### Layer 3: CI/CD Standardization153154Standardize CI/CD so teams don't each build their own pipelines.155156```yaml157# .github/workflows/platform-ci.yml158# Teams just include the shared workflow159name: Build and Deploy160on:161 push:162 branches: [main]163164jobs:165 pipeline:166 uses: myorg/platform-workflows/.github/workflows/standard-pipeline.yml@v2167 with:168 language: node169 deploy-target: production170 secrets: inherit171```172173**Key practices:**174- Shared CI/CD templates that teams include (not copy)175- Automatic security scanning (SAST, dependency audit)176- Standardized deployment strategies (canary, blue/green)177- Automatic rollback on failed health checks178179### Layer 4: Observability180181Pre-configured monitoring so developers get dashboards and alerts out of the box.182183- **Metrics**: Prometheus + Grafana with standard dashboards per service184- **Logging**: Structured logging with centralized collection (Loki, ELK)185- **Tracing**: Distributed tracing with OpenTelemetry186- **Alerting**: PagerDuty/Opsgenie integration with sensible defaults187188```mermaid189graph LR190 Service[Your Service] -- "OpenTelemetry SDK" --> Collector[OTel Collector]191 Collector --> Prometheus[Prometheus]192 Collector --> Loki[Loki]193 Collector --> Tempo[Tempo]194 Prometheus --> Grafana[Grafana Dashboards]195 Loki --> Grafana196 Tempo --> Grafana197 Grafana --> PagerDuty[PagerDuty Alerts]198```199200## Measuring Success201202How do you know your platform is working? Track these metrics:203204### DORA Metrics205- **Deployment frequency**: How often code reaches production206- **Lead time for changes**: Time from commit to production207- **Change failure rate**: Percentage of deployments causing failures208- **Mean time to recovery**: Time to restore service after an incident209210### Platform-Specific Metrics211- **Time to first deploy**: How long from "new service" to first production deploy212- **Developer satisfaction (NPS)**: Survey your users regularly213- **Self-service ratio**: % of infrastructure requests handled without tickets214- **Golden path adoption**: % of services following the recommended path215216## Common Mistakes217218### 1. Building Too Much, Too Soon219Start with the biggest pain point, not a grand vision. If deployments are painful, start there. If provisioning takes weeks, start there.220221### 2. Not Treating the Platform as a Product222The platform team needs a product manager, user research, and feedback loops. Developers are your customers - understand their needs.223224### 3. Mandating Instead of Attracting225The best platforms are adopted voluntarily because they make developers' lives easier. If you have to mandate usage, your platform isn't good enough.226227### 4. Ignoring Developer Experience228A platform with terrible UX won't be used. Invest in clear documentation, helpful error messages, and fast feedback loops.229230## Getting Started231232A practical roadmap for building your first IDP:233234```mermaid235flowchart TD236 A[Month 1-2: Discovery] --> B[Month 3-4: MVP]237 B --> C[Month 5-6: Iterate]238 C --> D[Month 7+: Scale]239240 A --> A1[Interview developers]241 A --> A2[Map pain points]242 A --> A3[Choose first golden path]243244 B --> B1[Deploy Backstage]245 B --> B2[First service template]246 B --> B3[Standardized CI/CD]247248 C --> C1[Gather feedback]249 C --> C2[Add infrastructure abstraction]250 C --> C3[Improve docs and onboarding]251252 D --> D1[More templates and golden paths]253 D --> D2[Self-service infrastructure]254 D --> D3[Advanced observability]255```256257### Minimum Viable Platform2582591. **Service catalog** (Backstage) - know what exists and who owns it2602. **One service template** - golden path for your most common service type2613. **Standardized CI/CD** - shared pipeline that teams include2624. **Basic docs** - how to use the platform, where to get help263264You can build this MVP in 2-3 months with a team of 2-3 engineers.265266## Conclusion267268Platform Engineering is not about building the perfect platform from day one. It's about incrementally reducing the cognitive load on developers so they can focus on building products. Start small, measure impact, and iterate based on developer feedback.269270The organizations that invest in Platform Engineering will have a significant competitive advantage: faster delivery, happier developers, and more reliable systems.271272**Resources:**273- [Team Topologies](https://teamtopologies.com/) - the book that popularized platform teams274- [Backstage](https://backstage.io/) - Spotify's open-source developer portal275- [CNCF Platforms White Paper](https://tag-app-delivery.cncf.io/whitepapers/platforms/) - community definition and best practices276- [platformengineering.org](https://platformengineering.org/) - community, events, and resources277
:Platform Engineering: How to Build an Internal Developer Platformlines 1-277 (END) — press q to close