Infrastructure Overview¶
The WebGrip platform is built on a Kubernetes-based infrastructure that provides a scalable, secure, and observable foundation for application development and deployment.
Infrastructure Philosophy¶
Our infrastructure follows these core principles:
- 🏗️ Infrastructure as Code: All infrastructure defined in version-controlled Helm charts
- 🔒 Security by Default: Security controls built into every layer of the stack
- 📊 Observable by Design: Comprehensive monitoring and logging across all components
- ⚡ Self-Service: Developers can deploy and manage applications independently
- 🔄 GitOps: All changes flow through Git workflows with proper review and automation
Platform Stack¶
Container Orchestration¶
Technology: Kubernetes
Configuration: catalog/systems/kubernetes.yaml
Kubernetes provides the foundation for container orchestration, offering:
- Pod Management: Automated deployment, scaling, and health management
- Service Discovery: Built-in DNS and service mesh capabilities
- Resource Management: CPU, memory, and storage allocation and limits
- Security Isolation: Namespace-based multi-tenancy and network policies
Package Management¶
Technology: Helm
Configuration: ops/helm/
Helm manages Kubernetes application deployments through templated charts:
| Chart Category | Purpose | Location |
|---|---|---|
| 005-tainters | Node tainting and tolerations | ops/helm/005-tainters/ |
| 007-cluster-monitoring | Platform monitoring stack | ops/helm/007-cluster-monitoring/ |
| 010-cert-manager | Certificate automation | ops/helm/010-cert-manager/ |
| 020-cluster-issuers | Certificate issuers | ops/helm/020-cluster-issuers/ |
| 030-ingress-controllers | Ingress and load balancing | ops/helm/030-ingress-controllers/ |
| 040-gha-runners-controller | CI/CD infrastructure | ops/helm/040-gha-runners-controller/ |
| 045-gha-runners | Self-hosted runner instances | ops/helm/045-gha-runners/ |
| 060-grafana-stack | Observability dashboards | ops/helm/060-grafana-stack/ |
| 950-example-services | Sample applications | ops/helm/950-example-services/ |
📋 Chart Naming Convention: Charts are numbered to indicate deployment order and dependencies. Lower numbers deploy first.
Cloud Infrastructure¶
Provider: AWS (Amazon Web Services)
Cluster: DigitalOcean Kubernetes (DOKS)
Configuration: catalog/resources/staging-doks-cluster.yaml
The platform currently runs on a DigitalOcean Kubernetes cluster with AWS integrations for:
- Identity & Access: AWS IAM integration for access control
- Container Registry: AWS ECR for container image storage
- Backup & Recovery: AWS S3 for persistent volume backups
- External Services: AWS services for extended platform capabilities
Infrastructure Components¶
Core Platform Services¶
Component Details¶
| Component | Type | Purpose | Configuration |
|---|---|---|---|
| Traefik | Ingress Controller | Load balancing and traffic routing | ops/helm/030-ingress-controllers/ingress-traefik/ |
| cert-manager | Certificate Automation | TLS certificate provisioning | ops/helm/010-cert-manager/ |
| kube-prometheus-stack | Monitoring | Metrics collection and alerting | ops/helm/007-cluster-monitoring/kube-prometheus-stack/ |
| Grafana | Observability | Dashboard visualization | ops/helm/060-grafana-stack/ |
| GitHub Actions Runners | CI/CD | Self-hosted CI/CD execution | ops/helm/040-gha-runners-controller/ |
Infrastructure Automation¶
GitOps Workflow¶
All infrastructure changes follow a GitOps workflow:
Deployment Automation¶
Workflow: .github/workflows/on_source_change.yml
Infrastructure deployments are automated through GitHub Actions:
- Validation: Helm chart linting and security scanning
- Staging: Deploy to staging environment for validation
- Production: Deploy to production with manual approval gates
- Verification: Automated health checks and rollback on failure
Secret Management¶
Technology: SOPS + Age
Configuration: ops/secrets/
Secrets are encrypted at rest and managed through:
- Encryption: Age-based encryption with public key distribution
- Access Control: Role-based access to decrypt specific secret categories
- Audit Trail: All secret changes tracked in Git history
- Rotation: Structured processes for secret rotation and distribution
Secret Categories:
- 007-kube-prometheus-stack-secrets: Monitoring credentials
- 010-cert-manager-secrets: Certificate authority credentials
- 030-ingress-controllers: Ingress configuration secrets
- 045-gha-runners-secrets: CI/CD runner credentials
- 060-grafana-stack: Dashboard and alerting credentials
Infrastructure Requirements¶
Prerequisites¶
To work with this infrastructure, you need:
1 2 | |
Access Requirements: - AWS CLI configured with appropriate permissions - kubectl access to the staging cluster - Age key for secret decryption (for authorized personnel) - GitHub repository access for GitOps workflows
Local Development Setup¶
1 2 3 4 5 6 7 8 9 10 | |
🔧 Makefile Commands: All operational commands are documented in the
Makefilewith targets for common tasks.
Infrastructure Monitoring¶
Health Indicators¶
The platform monitors key infrastructure health metrics:
| Metric | Source | Dashboard |
|---|---|---|
| Cluster Resource Usage | kube-state-metrics | Cluster Overview |
| Pod Health & Restarts | kubelet | Pod Status Dashboard |
| Ingress Traffic & Latency | Traefik | Ingress Dashboard |
| Certificate Expiry | cert-manager | Certificate Dashboard |
| CI/CD Runner Health | GitHub Actions Controller | CI/CD Dashboard |
Alerting¶
Critical infrastructure alerts are configured for:
- Cluster Resource Exhaustion: CPU, memory, and storage thresholds
- Component Health: Platform service availability and response times
- Security Events: Certificate expiry, authentication failures
- Performance Degradation: Latency and error rate thresholds
Next Steps¶
Dive deeper into specific infrastructure areas:
- 🌐 Network Architecture
Understand pod networking, service discovery, and ingress configuration
- 🔒 Security Model
Review security controls, access management, and compliance
- 📊 Resource Management
Learn about resource allocation, scaling, and capacity planning
- 🔧 Platform Components
Explore individual platform services and their configurations
🏗️ Infrastructure Evolution: Infrastructure changes follow our Architecture Decision Records (ADRs). Significant infrastructure modifications require an ADR and stakeholder review.