Service Status and Diagnostics
Operational playbook for monitoring system health, diagnosing service failures, and maintaining platform uptime.
Service Status and Diagnostics
Maintaining high availability for VeriWorkly requires active monitoring of its distributed components, including the primary API, background workers, and persistent data stores. This guide outlines the procedures for health assessment and technical diagnostics.
1. Automated Health Monitoring
The most reliable mechanism for assessing system state is the backend health endpoint. This route performs active probes of the PostgreSQL database and Redis connection state rather than simply confirming service availability.
Endpoint: GET /api/v1/health
Nominal Operational State (200 OK)
When all integrated systems are functioning within defined parameters, the API returns a success response:
{
"success": true,
"message": "Server is healthy",
"data": {
"status": "ok",
"database": "connected",
"redis": "connected",
"timestamp": "2026-04-26T10:00:00.000Z"
}
}Degraded Operational State (503 Service Unavailable)
If a dependency failure is detected, the API returns a service unavailable status:
{
"success": false,
"message": "Service Unavailable",
"data": {
"status": "degraded",
"database": "disconnected",
"redis": "connected",
"timestamp": "2026-04-26T10:05:00.000Z"
}
}Implementation Recommendation: Integrate this endpoint with monitoring services such as Datadog, UptimeRobot, or Better Stack to facilitate automated alerting upon state transitions.
2. Containerized Environment Diagnostics
In scenarios where health checks fail or containers exhibit unstable behavior (e.g., restart loops), utilize the following diagnostic commands.
Service State Assessment
Verify the operational status of all containerized services defined in the orchestration configuration:
docker compose psResource Utilization Analysis
Document rendering via Playwright is memory-intensive. Monitor resource consumption to identify potential Out-of-Memory (OOM) conditions:
docker statsSignificant memory consumption (approaching 100% of the allocated limit) necessitates either additional host resources or a reduction in concurrent rendering job limits.
3. Log Analysis and Troubleshooting
Technical diagnostics require analysis of service-specific logs. Execute these commands from the root directory containing the compose.yaml file.
Backend API Logs
docker compose logs --tail=100 -f apiKey Indicators: Uncaught exceptions, Prisma connectivity errors, or Zod validation failures for environment variables.
Redis Service Logs
bash docker compose logs --tail=100 -f redis Key Indicators: Memory allocation failures
or persistence (RDB/AOF) errors.
Frontend Application Logs
docker compose logs --tail=100 -f webKey Indicators: Fetch errors or internal network resolution failures (ECONNREFUSED).
4. Common Operational Failure Modes
| Issue | Potential Cause | Resolution Strategy |
|---|---|---|
| Database Connectivity | Invalid DATABASE_URL or network firewall restrictions. | Verify credentials and ensure the API has network access to the PostgreSQL instance. |
| Redis Memory Exhaustion | Backed-up job queue or excessive data caching. | Increase container memory limits or optimize job processing frequency. |
| CORS Policy Violations | Misconfigured ALLOWED_ORIGINS variable. | Ensure the backend environment configuration exactly matches the public frontend URI. |
| Document Rendering Delay | Hanging Playwright processes or zombie Chromium instances. | Restart the API service to clear the process tree and re-initialize the rendering workers. |
| Zod Validation Failure | Missing or malformed environment variables in production. | Review the startup logs to identify the specific variable failing validation. |