Observability
Purpose: Enable effective monitoring and troubleshooting.
Contents - Logging Strategy - Metrics Collection - Health Monitoring - Alerting - Sources
Logging Strategy¶
Log Destinations:
Component | Log Location | Format | Rotation | Purpose | Source |
---|---|---|---|---|---|
Firefly III Application | /var/www/app/storage/logs/laravel.log |
JSON | Daily | Application events, errors | "Firefly III Logging" — https://docs.firefly-iii.org/how-to/firefly-iii/installation/self-hosted/ — retrieved 2025-01-09 |
Nginx Access | stdout | Combined log format | Docker log driver | HTTP request tracking | "Nginx Configuration" — ops/docker/nginx/ — retrieved 2025-01-09 |
Nginx Error | stderr | Standard error format | Docker log driver | Web server errors | "Nginx Configuration" — ops/docker/nginx/ — retrieved 2025-01-09 |
MariaDB | /var/log/mysql/error.log |
MySQL format | Size-based | Database errors | "MariaDB Configuration" — ops/docker/mariadb/ — retrieved 2025-01-09 |
Redis | stdout | Redis format | Docker log driver | Cache operations | "Redis Configuration" — ops/docker/redis/ — retrieved 2025-01-09 |
Structured Logging Configuration:
1 2 3 4 5 6 7 |
|
Log Level Guidelines:
Level | When to Use | Examples |
---|---|---|
DEBUG | Development debugging | SQL queries, cache operations |
INFO | Normal operations | User login, transaction creation |
WARNING | Recoverable errors | Failed external API calls, validation warnings |
ERROR | Application errors | Unhandled exceptions, database errors |
CRITICAL | System failures | Database unavailable, filesystem full |
Centralized Logging (Production):
1 2 3 4 5 6 7 8 |
|
Metrics Collection¶
Application Metrics:
Metric | Type | Description | Labels | Source |
---|---|---|---|---|
firefly_transactions_total |
Counter | Total transactions created | type, user_id | "Firefly III Metrics" — Application Events — retrieved 2025-01-09 |
firefly_active_users |
Gauge | Currently active users | - | "Firefly III Metrics" — Application Events — retrieved 2025-01-09 |
firefly_account_balance |
Gauge | Account balances by currency | account_type, currency | "Firefly III Metrics" — Application Events — retrieved 2025-01-09 |
firefly_budget_utilization |
Gauge | Budget spending percentage | budget_id, period | "Firefly III Metrics" — Application Events — retrieved 2025-01-09 |
firefly_import_jobs_total |
Counter | Data import jobs | status, format | "Firefly III Metrics" — Application Events — retrieved 2025-01-09 |
firefly_rule_executions_total |
Counter | Rule engine executions | rule_id, outcome | "Firefly III Metrics" — Application Events — retrieved 2025-01-09 |
Infrastructure Metrics:
Metric | Type | Description | Labels |
---|---|---|---|
container_cpu_usage_percent |
Gauge | Container CPU utilization | container_name |
container_memory_usage_bytes |
Gauge | Container memory usage | container_name |
mysql_connections_active |
Gauge | Active database connections | - |
redis_memory_usage_bytes |
Gauge | Redis memory consumption | - |
http_requests_total |
Counter | HTTP requests received | method, status_code |
http_request_duration_seconds |
Histogram | HTTP request latency | method, status_code |
Prometheus Configuration:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Custom Metrics Implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Health Monitoring¶
Health Check Endpoints:
Endpoint | Purpose | Response Format | Checks Performed | Source |
---|---|---|---|---|
/health |
Basic health status | JSON | HTTP 200/503, basic connectivity | "Health Check Implementation" — src/index.js — retrieved 2025-01-09 |
/health/detailed |
Comprehensive health | JSON | Database, Redis, filesystem, external APIs | "Health Check Implementation" — src/index.js — retrieved 2025-01-09 |
/metrics |
Prometheus metrics | Prometheus format | Application and system metrics | "Metrics Implementation" — Custom — retrieved 2025-01-09 |
Health Check Response Format:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
Container Health Checks:
1 2 3 4 5 6 7 |
|
Readiness vs Liveness: - Liveness: Basic process health (HTTP 200 response) - Readiness: Full system health (database, cache, dependencies) - Startup: Initial system initialization (longer timeout)
Alerting¶
Critical Alerts (Immediate Response):
Alert | Condition | Severity | Action Required |
---|---|---|---|
Application Down | HTTP health check fails for 3 minutes | Critical | Immediate investigation |
Database Unavailable | Database connection failures > 50% | Critical | Database recovery |
High Error Rate | Error rate > 5% for 5 minutes | High | Code/configuration review |
Memory Exhaustion | Container memory > 90% for 10 minutes | High | Resource scaling |
Warning Alerts (Next Business Day):
Alert | Condition | Severity | Action Required |
---|---|---|---|
Slow Response Times | P95 latency > 2s for 15 minutes | Medium | Performance optimization |
Disk Space Low | Storage usage > 80% | Medium | Cleanup or expansion |
High Transaction Volume | 50% above normal for 1 hour | Medium | Capacity planning |
Alert Manager Configuration:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Prometheus Alert Rules:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Dashboards¶
Grafana Dashboard Panels:
-
Overview Dashboard:
- Service health status
- Request rate and latency
- Error rate trends
- Active user count
-
Application Dashboard:
- Transaction creation rate
- Account balance trends
- Budget utilization
- Import job success rate
-
Infrastructure Dashboard:
- Container resource usage
- Database performance metrics
- Cache hit rates
- Network traffic
Dashboard Configuration:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Troubleshooting Workflows¶
Performance Issues: 1. Check dashboard for resource utilization 2. Review slow query logs 3. Analyze request patterns 4. Check for memory leaks 5. Verify cache performance
Error Investigation: 1. Check error rate dashboard 2. Review application logs 3. Correlate with deployment events 4. Check external service status 5. Verify configuration changes
Capacity Planning: 1. Monitor growth trends 2. Analyze peak usage patterns 3. Project future requirements 4. Plan scaling activities
Sources¶
- "Firefly III Installation Guide" — https://docs.firefly-iii.org/how-to/firefly-iii/installation/self-hosted/ — retrieved 2025-01-09
- "Laravel Logging Documentation" — https://laravel.com/docs/logging — retrieved 2025-01-09