Skip to content

DevOps & Infrastructure (DVO)

Module Purpose: The platform's life support systems. It ensures the Pebble Orchestrator is secure, available, and recoverable. It covers automated provisioning, backups, and real-time observability.

[!IMPORTANT] Production Standard: All infrastructure is defined as code (IaC) to ensure 4-hour Recovery Time Objective (RTO).


Use Case Quick Reference

ID Title Priority
US-DVO-001 Automated User Provisioning P1
US-DVO-002 Automated Backups & Retention P1
US-DVO-003 Real-time Uptime Monitoring P1
US-DVO-004 Disaster Recovery Procedure P1

US-DVO-001: Automated User Provisioning

What It Does

Syncs users from the central Pebble Auth system to all integrated tools (like Plane.so) automatically. When a new sales rep is hired in the CRM, they instantly get access to the Kanban board without manual IT intervention.

Who: System Admin / HR Trigger
When: On User Creation


How It Works

  1. Trigger: New user account created in Pebble (Django).
  2. Action:
  3. Fires a User.created signal.
  4. Calls Plane API to create a workspace member with specific role (Member/Admin).
  5. Generates and emails an invitation link.
  6. Offboarding: Disabling Pebble account automatically revokes tokens for Plane.

US-DVO-002: Automated Backups & Retention

What It Does

Protects against data corruption or accidental deletion. It performs nightly snapshots of the entire stack and offloads them to secure, off-site storage.

Who: DevOps Automation
When: Nightly (2 AM)


How It Works

  1. Database: Runs pg_dump on PostgreSQL (Pebble + Plane).
  2. Storage: Offloads compressed dumps to S3/MinIO.
  3. Retention:
  4. Nightly backups kept for 30 days.
  5. Monthly snapshots kept for 1 year.
  6. Alerting: If backup fails to upload, sends urgent Slack ping to IT.

US-DVO-003: Real-time Uptime Monitoring

What It Does

The "Dashlight". Monitors the health of all microservices and triggers alerts before users notice a slowdown.

Who: System Watchdog
When: Continuous


How It Works

  1. Heartbeat: Polls /health endpoints of Email Listener, Classifier, and Plane.
  2. Latency: Tracks API response times (Target: < 500ms).
  3. Dashboard: Grafana displays:
  4. Green = Healthy.
  5. Yellow = Degraded (Slow).
  6. Red = Down.
  7. Escalation: If down for > 5 mins, triggers SMS to On-call Engineer.

US-DVO-004: Disaster Recovery Procedure

What It Does

The "Emergency Break". A documented and tested procedure to rebuild the entire platform from scratch in a new region/server in under 4 hours.

Who: IT Recovery Team
When: Critical System Failure


How It Works

  1. Infrastructure: Deploy using Terraform scripts to AWS/Azure/DigitalOcean.
  2. Restore: Pull latest data dump from DVO-002.
  3. Verification: Automated test suite runs to check Ingestion -> Classification flow.
  4. Cutover: Update DNS records to points to new environment.

← Back to Use Cases