Platform Overview¶
The WebGrip Organisation Public Platform is a comprehensive Kubernetes-based infrastructure platform that provides the foundation for application development, deployment, and operations across the WebGrip organization.
Platform Purpose¶
This platform serves as the organizational backbone for:
- Development Teams: Providing self-service application deployment and management
- Infrastructure Teams: Centralizing platform operations and maintenance
- Security Teams: Enforcing security policies and compliance requirements
- Operations Teams: Monitoring, alerting, and incident response capabilities
Platform Architecture¶
High-Level Architecture¶
Platform Layers¶
| Layer | Components | Purpose |
|---|---|---|
| Ingress | Traefik, cert-manager | External traffic routing, TLS termination |
| Application | User applications, Platform services | Business logic and platform capabilities |
| Platform | Monitoring, Logging, Storage | Cross-cutting platform services |
| Infrastructure | Kubernetes nodes, Networking, Security | Foundation compute and network |
Core Capabilities¶
Infrastructure as Code¶
Repository Location:
ops/helm/
All infrastructure is defined as code using Helm charts, providing:
- Reproducible Deployments: Consistent environments across development, staging, and production
- Version Control: All infrastructure changes tracked in Git
- Rollback Capability: Easy rollback to previous working configurations
- Documentation: Self-documenting infrastructure through code
Key Infrastructure Components:
- Cluster Monitoring: ops/helm/007-cluster-monitoring/
- Certificate Management: ops/helm/010-cert-manager/
- Ingress Controllers: ops/helm/030-ingress-controllers/
- CI/CD Infrastructure: ops/helm/040-gha-runners-controller/
Service Discovery & Catalog¶
Repository Location:
catalog/
Backstage-powered service catalog providing:
- Domain Organization: Business domain boundaries and ownership
- System Mapping: Technical system relationships and dependencies
- Component Registry: Service inventory with metadata and documentation
- API Documentation: Centralized API discovery and specifications
CI/CD Automation¶
Repository Location:
.github/workflows/
GitHub Actions-based automation providing:
- Application Lifecycle: Automated repo creation and bootstrapping
- Deployment Automation: Standardized deployment procedures
- Documentation Publishing: Automatic TechDocs updates
- Security Scanning: Integrated security and compliance checks
Secret Management¶
Repository Location:
ops/secrets/
SOPS and Age-based secret management providing:
- Encrypted at Rest: All secrets encrypted in repository
- Fine-grained Access: Role-based access to secret categories
- Audit Trail: All secret changes tracked in Git history
- Rotation Support: Structured approach to secret rotation
Observability¶
Repository Location:
grafana-dashboards/
Comprehensive monitoring and observability:
- Metrics Collection: Prometheus-based metrics
- Dashboard Visualization: Pre-built Grafana dashboards
- Alerting: Proactive monitoring and incident response
- Log Aggregation: Centralized application and platform logs
Platform Benefits¶
For Developers¶
- 🚀 Fast Time-to-Market: Standardized templates and deployment pipelines
- 📊 Built-in Observability: Monitoring and alerting included by default
- 🔐 Security by Default: Security policies and secret management built-in
- 📚 Self-Service Documentation: Complete platform documentation and runbooks
For Operations¶
- ⚙️ Standardized Operations: Consistent deployment and management procedures
- 🔍 Full Visibility: Comprehensive monitoring across all platform components
- 🛡️ Security Compliance: Built-in security scanning and policy enforcement
- 📈 Scalability: Auto-scaling and resource management capabilities
For Organization¶
- 💰 Cost Efficiency: Shared infrastructure and standardized tooling
- ⚡ Developer Productivity: Reduced operational overhead for development teams
- 🎯 Consistency: Standardized approaches across all projects and teams
- 📋 Governance: Clear ownership, documentation, and decision tracking
Getting Started¶
Ready to start using the platform? Choose your path:
- 👨💻 I'm a Developer
Start with the Onboarding Guide to set up your local environment and deploy your first application.
- ⚙️ I'm a Platform Engineer
Review the Cluster Architecture and Platform Components to understand the technical implementation.
- 📋 I'm a Product Manager
Explore the Service Catalog to understand the organizational structure and service ownership.
- 🛡️ I'm a Security Engineer
Review the Security Model and Security Policies to understand the platform's security posture.
Platform Metrics¶
Key platform health indicators:
| Metric | Current Status | Target |
|---|---|---|
| Platform Uptime | 99.9% | >99.5% |
| Application Deployment Time | <5 minutes | <10 minutes |
| Mean Time to Recovery (MTTR) | <15 minutes | <30 minutes |
| Developer Onboarding Time | <2 hours | <4 hours |
| Security Scan Coverage | 100% | 100% |
📊 Live Metrics: View real-time platform metrics in Grafana dashboards
Next Steps¶
- 📋 Prerequisites - Ensure you have required tools and access
- 🏗️ Cluster Architecture - Understand the underlying infrastructure
- 🔧 Platform Components - Learn about core platform services
- 📖 Operations Runbooks - Master platform operations procedures
💡 Platform Evolution: This platform follows our Architecture Decision Records (ADRs). Proposed changes should include an ADR for significant architectural decisions.