This document serves as the comprehensive reference guide for the deployment architecture of the Cortex Hub AI system. It maps the journey of a request from the external internet to the backend code, details the containerized architecture, and outlines the automated deployment (CI/CD) paths, to help debug future production issues.
When an external user visits https://ai.jerxie.com, the request cascades through the following components:
flowchart TD
A[External User] -->|HTTPS :443| B(Envoy Control Plane Proxy\nHost IP: 192.168.68.90 :10001)
B -->|HTTP :80| C{Nginx Gateway\nContainer: ai_unified_frontend\nHost IP: 192.168.68.113 :8002}
C -->|Static Files / JS / CSS| D[React Frontend Build]
C -->|/api/v1/* Routing| E(FastAPI Backend\nContainer: ai_hub_service\nExposed: :8000 internally)
1. Envoy Proxy (192.168.68.90):
ai.jerxie.com), and passes decrypted traffic internally.http://192.168.68.90:8090/.HTTPS to HTTP.2. Nginx Gateway (192.168.68.113 - ai_unified_frontend container):
/api/v1/ calls directly to the Python backend.8002 (internal container port 80) to avoid port collisions with other host services.X-Forwarded-Proto) along so FastAPI knows it originated as HTTPS.3. Uvicorn / FastAPI Backend (ai_hub_service container):
--proxy-headers and --forwarded-allow-ips="*" to ensure it trusts X-Forwarded-Proto variables injected by Nginx.The code moves from development to production through a formalized sequence, managed by dedicated local shell scripts.
scripts/remote_deploy.sh (The Trigger)
rsync over SSH (sshpass) to securely copy local workspace (/app/) changes onto the production server 192.168.68.113 under a temporary /tmp/ directory..git, node_modules, __pycache__)./home/coder/project/cortex-hub) taking care to retain system permissions.scripts/local_rebuild.sh script centrally on the production server.scripts/local_rebuild.sh (The Builder)
deployment/.-f docker-compose.yml (Generic common config)-f deployment/jerxie-prod/docker-compose.production.yml (Jerxie-specific overrides)Uvicorn startup lifecycle.You or an Agent must safely pass the authentication key into the script from the command line:
REMOTE_PASS='<PASSWORD>' bash scripts/remote_deploy.sh
To maintain a clean repository and allow for generic onboarding while keeping Jerxie's specific production requirements separate, the following layout is enforced:
docker-compose.yml: Baseline configuration. Uses local Docker volumes and generic localhost endpoints. Default for new users.deployment/jerxie-prod/: Contains Jerry's specific production overrides (NFS volume on 192.168.68.90, SSL/OIDC endpoints for ai.jerxie.com).scripts/: Centralized automation scripts for remote sync, local rebuilding, and node registration. For any automated AI attempting to debug or push changes:.agent/workflows/deployment_reference.md) defines the architecture rules..agent/workflows/deploy_to_production.md for explicit directory movement maps and scripting..agent/workflows/envoy_control_plane.md if the backend successfully mounts on the remote machine but ai.jerxie.com/api/ throws 404s/503s.Cortex supports zero-touch maintenance for distributed agent nodes (like Mac Minis or remote VMs).
agent_node/main.py) runs a background AutoUpdater thread.GET /api/v1/agent/version on the Hub using the SECRET_KEY for authentication.VERSION file, the agent:
skills/) from GET /api/v1/agent/download.bootstrap_installer.py --update-only to extract and install new dependencies.os.execv) to load the new code.--ignore-installed to prevent common Python/Anaconda metadata errors from stalling updates.If production encounters a bug or routing fails, these are the historically primary offenders:
Symptoms: OAuth login fails because Dex/Auth redirects the user back to http://ai.jerxie.com/api/... instead of https://. Root Cause: FastAPI believes it's serving HTTP because Nginx didn't forward the Envoy proxy scheme. Verification Check:
app/ai-hub/Dockerfile: Ensure uvicorn terminates with --proxy-headers --forwarded-allow-ips "*".nginx.conf: Ensure proxy_set_header X-Forwarded-Proto relies on the HTTP dynamically captured from Envoy, NOT a hard-coded $scheme string.Symptoms: Normal API calls return strange transport errors, or WebSocket voice channels refuse to upgrade. Root Cause: Misconfigured HTTP Upgrade logic. Setting Connection "upgrade" unconditionally on an Nginx location breaks normal HTTP REST calls. Verification Check:
nginx.conf mappings. It must have:
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}... and passed safely into the proxy_set_header Connection $connection_upgrade; field.Symptoms: Backend loads normally, but Frontend acts as dead or Exit 1 silently during deploy_remote.sh. Root Cause: Binding collisions on the production host 192.168.68.113. (e.g., trying to bind Nginx to host port 8000 when another container python3_11 uses it). Verification Check:
docker ps -a on 192.168.68.113.8002:80) mapped inside docker-compose.yml does not overlap with any live container binding arrays.Symptoms: curl -v https://ai.jerxie.com dumps a generic 404 or 503 Service Unavailable with server: envoy. Root Cause: The Envoy FilterChain (Listener SNI Map) doesn't trace back to a correct, valid Docker IP:Port allocation. Verification Check:
portValue in the JSON Endpoint equates to the one published in docker-compose.yml (8002 vs 8000). If mismatched, you must format a JSON package and POST it to /add-cluster utilizing the EnvoryControlPlane workflow.Symptoms: User attempts to login but is immediately rejected by Auth. Jerxie (Dex). Dex logs show: level=ERROR msg="failed to parse authorization request" err="Invalid client_id (\"\")." Root Cause: API routes were incorrectly capturing OIDC settings from settings at the module's initial import time rather than at request time. Empty/Default environment variables were being permanentized until a full container restart. Verification Check:
ai_hub_service logs for: Initiating OIDC login. Client ID: 'cortex-server'.settings singleton is accessed inside the route function (or via a helper like get_oidc_urls) to ensure dynamic resolution.Symptoms: The ai.jerxie.com UI becomes unresponsive (spinning loaders or time-outs) for 1-2 minutes before suddenly recovering. No backend crashes are recorded. Root Cause: SQLite on NFS. The production deployment uses an NFS-mounted volume (192.168.68.90) for the relational database. SQLite's write-ahead-logging (WAL) and intense locking requirements are highly incompatible with network filesystem latencies. Any minor network spike or high NFS load causes the Hub process to block entirely while waiting for a lock. Resolution:
ai_hub.db to a Local Volume or a Local Host Path on 192.168.68.113. NFS should be reserved for static sync folders or bulky assets, never for the primary relational database.