Log Management: Centralised Logging and Analysis
Logs are the primary tool for understanding what your systems are doing — and what went wrong when things fail. Centralised log management aggregates logs from all components of your infrastructure into a single, searchable platform — enabling efficient debugging, incident investigation, and compliance auditing.
What Gets Logged
- Application logs: Events, errors, warnings, and debug information from your application code
- Web server logs: HTTP request logs — method, URL, response code, response time, user agent
- Infrastructure logs: OS-level events, service start/stop, authentication events
- Database logs: Slow queries, error events, connection events
- Security logs: Authentication events, access denied, WAF triggers
- Deployment logs: What was deployed, when, and by whom
Log Levels
Standard log levels in order of severity: DEBUG (detailed debugging information), INFO (normal operational events), WARN (unexpected but handled situations), ERROR (errors that affect specific operations), FATAL (critical errors causing system failure). Production systems typically log at INFO level and above — DEBUG is enabled temporarily during investigation.
Structured Logging
Logs should be structured (JSON format) rather than plain text strings. Structured logs can be indexed and queried efficiently — finding all errors from a specific user, or all requests that exceeded 2 seconds, requires structured fields. We implement structured logging in all applications with consistent field names: timestamp, level, service, request_id, user_id, duration, error.
Tools
- AWS CloudWatch Logs: Native AWS log storage and basic analysis
- Datadog Logs: Full log management with APM correlation
- ELK Stack (Elasticsearch, Logstash, Kibana): Open-source log management
- Grafana Loki: Log aggregation optimised for cost efficiency