Production Checklist¶
Use this checklist to verify your Gaia application is ready for production. Each section covers a critical area — work through them before your first deployment and revisit them before every major release.
How to use this checklist
Copy this page into your project's issue tracker or wiki and check off items as you complete them. Items marked with are must-have for any production deployment. The rest are strongly recommended.
Security¶
- API keys are stored as environment variables or secrets — never committed to version control.
-
.envfiles are in.gitignore— only.env.examplefiles with placeholder values are committed. - CORS is restricted —
ALLOW_CORS_ORIGINis set to your specific frontend domain, not*. - HTTPS is enabled — all traffic between clients and your application is encrypted via TLS.
- Backend is not directly exposed — the frontend nginx container proxies API calls; the backend port is not published to the public internet.
- Rate limiting is configured — protect the backend from abuse with a middleware or reverse proxy rate limiter.
- Input validation on all endpoints — Pydantic models validate request bodies; query parameters are typed.
- Authentication on sensitive endpoints — session-based or token-based auth prevents unauthorized access to Gaia queries.
- Security headers are set —
X-Content-Type-Options,X-Frame-Options,Strict-Transport-Securityvia nginx or middleware. - Dependency vulnerabilities are scanned — run
pip auditandnpm auditin CI.
Python
# Example: rate limiting middleware with slowapi
from slowapi import Limiter
from slowapi.util import get_remote_address
limiter = Limiter(key_func=get_remote_address)
@app.post("/api/ask")
@limiter.limit("10/minute")
async def ask_gaia(request: Request, body: AskRequest):
...
Performance¶
- Connection pooling is enabled — the
GaiaClientreuses HTTP connections viahttpx.AsyncClient(already built into the SDK). - Response caching for repeated queries — cache frequent dataset listings and configuration responses.
- Pagination is implemented — exhaustive search results use pagination tokens to avoid loading everything at once.
- Frontend assets are cache-busted — Vite generates hashed filenames; nginx serves them with long cache TTLs.
- Gzip compression is enabled — nginx compresses text responses.
- Request timeout is tuned —
REQUEST_TIMEOUT_SECONDSis set appropriately (60s for standard queries, higher for exhaustive search). - Database queries are indexed — if using SQLite for session storage, ensure frequently queried columns have indexes.
Nginx Configuration File
# nginx gzip configuration
gzip on;
gzip_types text/plain text/css application/json application/javascript text/xml;
gzip_min_length 1000;
Reliability¶
- Health check endpoints exist — both backend (
/health) and frontend (nginx default) respond to health probes. - Graceful shutdown is handled — FastAPI shuts down cleanly on
SIGTERM, completing in-flight requests. - Error responses are structured — all errors return consistent JSON with
status,message, anddetailfields. - Retry logic for transient failures — the SDK retries on
429(rate limit) and503(service unavailable) with exponential backoff. - Circuit breaker for Gaia API — if the Gaia API is down, fail fast rather than accumulating timeouts.
- Database migrations are automated — schema changes are applied on startup or via a migration tool.
- Container restart policy is set —
restart: unless-stoppedin Docker Compose ensures recovery from crashes.
Python
# Example: structured error response
from fastapi import Request
from fastapi.responses import JSONResponse
@app.exception_handler(Exception)
async def global_exception_handler(request: Request, exc: Exception):
return JSONResponse(
status_code=500,
content={
"status": "error",
"message": "An unexpected error occurred",
"detail": str(exc) if app.debug else None,
},
)
Monitoring & Observability¶
- Structured logging is configured — use JSON-formatted logs with request ID, timestamp, and severity level.
- Request logging captures key metadata — log method, path, status code, duration, and user context for every request.
- Error tracking is integrated — Sentry, Datadog, or similar captures unhandled exceptions with full stack traces.
- Gaia API usage metrics are tracked — monitor query count, latency, token usage, and error rates.
- Alerting is configured — get notified when error rates spike, latency exceeds thresholds, or health checks fail.
- Log aggregation is set up — container logs are shipped to a centralized system (ELK, CloudWatch, Datadog).
Python
# Example: structured logging with structlog
import structlog
logger = structlog.get_logger()
@app.middleware("http")
async def log_requests(request: Request, call_next):
import time
start = time.perf_counter()
response = await call_next(request)
duration_ms = (time.perf_counter() - start) * 1000
logger.info(
"request_completed",
method=request.method,
path=request.url.path,
status_code=response.status_code,
duration_ms=round(duration_ms, 2),
)
return response
Scalability¶
- Backend is stateless — no in-memory session state; all state is in SQLite/external database.
- Session store can be externalized — when scaling beyond one backend instance, move sessions to Redis or PostgreSQL.
- Load balancing is configured — if running multiple backend instances, a load balancer distributes traffic evenly.
- Frontend is served from CDN — static assets are deployed to a CDN for global low-latency delivery.
- Database can be upgraded — migration path from SQLite to PostgreSQL is documented for when traffic demands it.
- Container resource limits are set — CPU and memory limits prevent a single container from consuming all host resources.
YAML
# docker-compose.yml resource limits
services:
backend:
deploy:
resources:
limits:
cpus: "1.0"
memory: 512M
reservations:
cpus: "0.25"
memory: 128M
Pre-Deployment Final Checks¶
- All environment variables are set — run the app with
Settings()to validate configuration. - Docker images build successfully —
docker compose buildcompletes without errors. - Health checks pass —
docker compose upshows both services ashealthy. - End-to-end test passes — a query from the frontend reaches Gaia and returns a response.
- SSL certificate is valid — check expiry and renewal automation.
- Backup strategy is in place — SQLite database is backed up regularly if it stores important data.
- Rollback plan is documented — you can revert to the previous version quickly if something goes wrong.
- Runbook exists — common operational tasks (restart, scale, debug) are documented for the on-call team.
Quick Validation Script¶
Run this script to verify the most critical items programmatically:
Bash
#!/bin/bash
# validate-deployment.sh — Quick production readiness check
set -e
echo "=== Production Readiness Check ==="
# 1. Check that .env is not committed
if git ls-files --error-unmatch backend/.env 2>/dev/null; then
echo "FAIL: backend/.env is tracked by git!"
exit 1
else
echo "PASS: .env files not in version control"
fi
# 2. Build images
echo "Building Docker images..."
docker compose build --quiet
echo "PASS: Docker images built successfully"
# 3. Start services
echo "Starting services..."
docker compose up -d
# 4. Wait for health checks
echo "Waiting for services to become healthy..."
for i in $(seq 1 30); do
if docker compose ps | grep -q "(healthy)" && \
! docker compose ps | grep -q "(health: starting)"; then
echo "PASS: All services healthy"
break
fi
if [ "$i" -eq 30 ]; then
echo "FAIL: Services did not become healthy within 30 seconds"
docker compose logs
docker compose down
exit 1
fi
sleep 1
done
# 5. Test health endpoint
STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8000/health)
if [ "$STATUS" -eq 200 ]; then
echo "PASS: Backend health endpoint returns 200"
else
echo "FAIL: Backend health endpoint returned $STATUS"
fi
# 6. Test frontend
STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://localhost/)
if [ "$STATUS" -eq 200 ]; then
echo "PASS: Frontend returns 200"
else
echo "FAIL: Frontend returned $STATUS"
fi
docker compose down
echo "=== Check complete ==="
Next Steps¶
- Cursor IDE Setup — Accelerate future development with AI assistance.
- Environment Configuration — Review your variable setup.
- Docker Deployment — Revisit container configuration.