diff --git a/.agent/workflows/deploy_to_production.md b/.agent/workflows/deploy_to_production.md index ae78068..1a136cb 100644 --- a/.agent/workflows/deploy_to_production.md +++ b/.agent/workflows/deploy_to_production.md @@ -5,7 +5,7 @@ This workflow automates the deployment of the Cortex Hub to the production server located at `192.168.68.113`. The production server runs the application out of `/home/coder/project/cortex-hub`. -**MAIN KNOWLEDGE POINT:** Agents and Users should refer to `docs/deployment_reference.md` to understand the full proxy and architecture layout prior to running production debugging. +**MAIN KNOWLEDGE POINT:** Agents and Users should refer to `.agent/workflows/deployment_reference.md` to understand the full proxy and architecture layout prior to running production debugging. Follow these steps carefully: diff --git a/.agent/workflows/deployment_reference.md b/.agent/workflows/deployment_reference.md new file mode 100644 index 0000000..5ca7b2e --- /dev/null +++ b/.agent/workflows/deployment_reference.md @@ -0,0 +1,112 @@ +--- +description: Reference Guide for Cortex Hub Production Architecture, proxy networking, and debugging deployments. +--- +# Cortex Hub: Deployment & CI/CD Architecture Reference + +This document serves as the comprehensive reference guide for the deployment architecture of the Cortex Hub AI system. It maps the journey of a request from the external internet to the backend code, details the containerized architecture, and outlines the automated deployment (CI/CD) paths, to help debug future production issues. + +## 1. High-Level Architecture Flow + +When an external user visits `https://ai.jerxie.com`, the request cascades through the following components: + +```mermaid +flowchart TD + A[External User] -->|HTTPS :443| B(Envoy Control Plane Proxy\nHost IP: 192.168.68.90 :10001) + B -->|HTTP :80| C{Nginx Gateway\nContainer: ai_unified_frontend\nHost IP: 192.168.68.113 :8002} + + C -->|Static Files / JS / CSS| D[React Frontend Build] + C -->|/api/v1/* Routing| E(FastAPI Backend\nContainer: ai_hub_service\nExposed: :8000 internally) +``` + +### Components Breakdown + +**1. Envoy Proxy** (`192.168.68.90`): +* **Role**: Primary Edge Proxy. Terminates external SSL (HTTPS), handles external SNI routing (matching `ai.jerxie.com`), and passes decrypted traffic internally. +* **Control API**: Envoy configurations are manipulated dynamically at `http://192.168.68.90:8090/`. +* **Important Note**: When Envoy forwards traffic internally, the scheme changes from `HTTPS` to `HTTP`. + +**2. Nginx Gateway** (`192.168.68.113` - `ai_unified_frontend` container): +* **Role**: Application Gateway. Handles the static React frontend natively, and reverses proxies all `/api/v1/` calls directly to the Python backend. +* **Network Binding**: Bound to host port `8002` (internal container port `80`) to avoid port collisions with other host services. +* **Protocol Map**: Converts connection headers between "upgrade" (WebSockets) and "close" (Standard HTTP), and passes Envoy's original scheme (`X-Forwarded-Proto`) along so FastAPI knows it originated as HTTPS. + +**3. Uvicorn / FastAPI Backend** (`ai_hub_service` container): +* **Role**: Core application logic. Needs to be aware it is sitting behind proxies to compute accurate callback URLs for tasks like **OIDC Login**, which inherently depend on accurate external domains and schemas. +* **Configuration**: Runs with `--proxy-headers` and `--forwarded-allow-ips="*"` to ensure it trusts `X-Forwarded-Proto` variables injected by Nginx. + +--- + +## 2. CI/CD Pipeline & Deployment Scripts + +The code moves from development to production through a formalized sequence, managed by dedicated local shell scripts. + +### The Scripts + +1. **`remote_deploy.sh` (The Triggger)** + * **Where it runs**: *Locally on your dev machine* + * **What it does**: + 1. Uses `rsync` over SSH (`sshpass`) to securely copy local workspace (`/app/`) changes onto the production server `192.168.68.113` under a temporary `/tmp/` directory. + 2. It specifically excludes massive or unnecessary folders (`.git`, `node_modules`, `__pycache__`). + 3. Overwrites the destination project folder (`/home/coder/project/cortex-hub`) taking care to retain system permissions. + 4. SSH triggers the `local_deployment.sh` script centrally on the production server. + +2. **`local_deployment.sh` (The Builder)** + * **Where it runs**: *Server 192.168.68.113* + * **What it does**: + 1. Destroys the old running containers. + 2. Triggers Docker Compose (`docker compose up -d --build --remove-orphans`) to rebuild the application context and discard deprecated container setups (e.g., when the UI shifted into Nginx). + 3. Performs automated database migrations running parallel idempotent logic (`app/db/migrate.py`) via the `Uvicorn` startup lifecycle. + +### How to Release +You or an Agent must safely pass the authentication key into the script from the command line. An Agent should **always prompt the USER** for this password before running it: +```bash +REMOTE_PASS='' bash /app/remote_deploy.sh +``` + +--- + +## 3. Automation Logic for Agents (.agent Workflows) +For any automated AI attempting to debug or push changes: +* **Primary Source of Truth**: This file (`.agent/workflows/deployment_reference.md`) defines the architecture rules. +* **Subflow - Deployment:** Look at `.agent/workflows/deploy_to_production.md` for explicit directory movement maps and scripting. +* **Subflow - Envoy Control Map:** Refer to `.agent/workflows/envoy_control_plane.md` if the backend successfully mounts on the remote machine but `ai.jerxie.com/api/` throws 404s/503s. + +--- + +## 4. Reference Troubleshooting & Known Pitfalls + +If production encounters a bug or routing fails, these are the historically primary offenders: + +### 1. OIDC Redirection Errors (Auth Flips to HTTP) +**Symptoms**: OAuth login fails because Dex/Auth redirects the user back to `http://ai.jerxie.com/api/...` instead of `https://`. +**Root Cause**: FastAPI believes it's serving HTTP because Nginx didn't forward the Envoy proxy scheme. +**Verification Check**: +* Check `app/ai-hub/Dockerfile`: Ensure `uvicorn` terminates with `--proxy-headers --forwarded-allow-ips "*"`. +* Check `nginx.conf`: Ensure `proxy_set_header X-Forwarded-Proto` relies on the HTTP dynamically captured from Envoy, NOT a hard-coded `$scheme` string. + +### 2. WebSocket Hanging / HTTP Request Failing +**Symptoms**: Normal API calls return strange transport errors, or WebSocket voice channels refuse to upgrade. +**Root Cause**: Misconfigured HTTP `Upgrade` logic. Setting `Connection "upgrade"` unconditionally on an Nginx location breaks normal HTTP REST calls. +**Verification Check**: +* Check `nginx.conf` mappings. It must have: + ```nginx + map $http_upgrade $connection_upgrade { + default upgrade; + '' close; + } + ``` + ... and passed safely into the `proxy_set_header Connection $connection_upgrade;` field. + +### 3. "Port is already allocated" (Container Fails to Deploy) +**Symptoms**: Backend loads normally, but Frontend acts as `dead` or `Exit 1` silently during `remote_deploy.sh`. +**Root Cause**: Binding collisions on the production host `192.168.68.113`. (e.g., trying to bind Nginx to host port `8000` when another container `python3_11` uses it). +**Verification Check**: +* Run `docker ps -a` on `192.168.68.113`. +* Ensure the frontend port (`8002:80`) mapped inside `docker-compose.yml` does not overlap with any live container binding arrays. + +### 4. Envoy Cluster Sync Outages +**Symptoms**: `curl -v https://ai.jerxie.com` dumps a generic 404 or `503 Service Unavailable` with `server: envoy`. +**Root Cause**: The Envoy FilterChain (Listener SNI Map) doesn't trace back to a correct, valid Docker IP:Port allocation. +**Verification Check**: +* Query the Control Plane API: `curl -s http://192.168.68.90:8090/get-cluster?name=_ai_unified_server`. +* Make sure `portValue` in the JSON Endpoint equates to the one published in `docker-compose.yml` (`8002` vs `8000`). If mismatched, you must format a JSON package and `POST` it to `/add-cluster` utilizing the EnvoryControlPlane workflow. diff --git a/.agent/workflows/envoy_control_plane.md b/.agent/workflows/envoy_control_plane.md index 34ac523..63f34ad 100644 --- a/.agent/workflows/envoy_control_plane.md +++ b/.agent/workflows/envoy_control_plane.md @@ -4,6 +4,8 @@ This workflow describes how to interact with the custom Envoy Control Plane API to manage Envoy configurations, specifically retrieving and upserting Listeners and Clusters. +**MAIN KNOWLEDGE POINT:** Agents and Users should refer to `.agent/workflows/deployment_reference.md` to understand the full proxy and architecture layout prior to running production debugging. + ### Base URL The API is available internally at `http://192.168.68.90:8090`. diff --git a/docs/deployment_reference.md b/docs/deployment_reference.md deleted file mode 100644 index c69a2d7..0000000 --- a/docs/deployment_reference.md +++ /dev/null @@ -1,109 +0,0 @@ -# Cortex Hub: Deployment & CI/CD Architecture Reference - -This document serves as the comprehensive reference guide for the deployment architecture of the Cortex Hub AI system. It maps the journey of a request from the external internet to the backend code, details the containerized architecture, and outlines the automated deployment (CI/CD) paths, to help debug future production issues. - -## 1. High-Level Architecture Flow - -When an external user visits `https://ai.jerxie.com`, the request cascades through the following components: - -```mermaid -flowchart TD - A[External User] -->|HTTPS :443| B(Envoy Control Plane Proxy\nHost IP: 192.168.68.90 :10001) - B -->|HTTP :80| C{Nginx Gateway\nContainer: ai_unified_frontend\nHost IP: 192.168.68.113 :8002} - - C -->|Static Files / JS / CSS| D[React Frontend Build] - C -->|/api/v1/* Routing| E(FastAPI Backend\nContainer: ai_hub_service\nExposed: :8000 internally) -``` - -### Components Breakdown - -**1. Envoy Proxy** (`192.168.68.90`): -* **Role**: Primary Edge Proxy. Terminates external SSL (HTTPS), handles external SNI routing (matching `ai.jerxie.com`), and passes decrypted traffic internally. -* **Control API**: Envoy configurations are manipulated dynamically at `http://192.168.68.90:8090/`. -* **Important Note**: When Envoy forwards traffic internally, the scheme changes from `HTTPS` to `HTTP`. - -**2. Nginx Gateway** (`192.168.68.113` - `ai_unified_frontend` container): -* **Role**: Application Gateway. Handles the static React frontend natively, and reverses proxies all `/api/v1/` calls directly to the Python backend. -* **Network Binding**: Bound to host port `8002` (internal container port `80`) to avoid port collisions with other host services. -* **Protocol Map**: Converts connection headers between "upgrade" (WebSockets) and "close" (Standard HTTP), and passes Envoy's original scheme (`X-Forwarded-Proto`) along so FastAPI knows it originated as HTTPS. - -**3. Uvicorn / FastAPI Backend** (`ai_hub_service` container): -* **Role**: Core application logic. Needs to be aware it is sitting behind proxies to compute accurate callback URLs for tasks like **OIDC Login**, which inherently depend on accurate external domains and schemas. -* **Configuration**: Runs with `--proxy-headers` and `--forwarded-allow-ips="*"` to ensure it trusts `X-Forwarded-Proto` variables injected by Nginx. - ---- - -## 2. CI/CD Pipeline & Deployment Scripts - -The code moves from development to production through a formalized sequence, managed by dedicated local shell scripts. - -### The Scripts - -1. **`remote_deploy.sh` (The Triggger)** - * **Where it runs**: *Locally on your dev machine* - * **What it does**: - 1. Uses `rsync` over SSH (`sshpass`) to securely copy local workspace (`/app/`) changes onto the production server `192.168.68.113` under a temporary `/tmp/` directory. - 2. It specifically excludes massive or unnecessary folders (`.git`, `node_modules`, `__pycache__`). - 3. Overwrites the destination project folder (`/home/coder/project/cortex-hub`) taking care to retain system permissions. - 4. SSH triggers the `local_deployment.sh` script centrally on the production server. - -2. **`local_deployment.sh` (The Builder)** - * **Where it runs**: *Server 192.168.68.113* - * **What it does**: - 1. Destroys the old running containers. - 2. Triggers Docker Compose (`docker compose up -d --build --remove-orphans`) to rebuild the application context and discard deprecated container setups (e.g., when the UI shifted into Nginx). - 3. Performs automated database migrations running parallel idempotent logic (`app/db/migrate.py`) via the `Uvicorn` startup lifecycle. - -### How to Release -You or an Agent must safely pass the authentication key into the script from the command line. An Agent should **always prompt the USER** for this password before running it: -```bash -REMOTE_PASS='' bash /app/remote_deploy.sh -``` - ---- - -## 3. Automation Logic for Agents (.agent Workflows) -For any automated AI attempting to debug or push changes: -* **Primary Source of Truth**: This `docs/deployment_reference.md` file defines the architecture rules. -* **Workflows available:** Look at `.agent/workflows/deploy_to_production.md` for explicit directory movement maps. -* **Envoy Control Map:** Refer to `.agent/workflows/envoy_control_plane.md` if the backend successfully mounts on the remote machine but `ai.jerxie.com/api/` throws 404s/503s. - ---- - -## 4. Reference Troubleshooting & Known Pitfalls - -If production encounters a bug or routing fails, these are the historically primary offenders: - -### 1. OIDC Redirection Errors (Auth Flips to HTTP) -**Symptoms**: OAuth login fails because Dex/Auth redirects the user back to `http://ai.jerxie.com/api/...` instead of `https://`. -**Root Cause**: FastAPI believes it's serving HTTP because Nginx didn't forward the Envoy proxy scheme. -**Verification Check**: -* Check `app/ai-hub/Dockerfile`: Ensure `uvicorn` terminates with `--proxy-headers --forwarded-allow-ips "*"`. -* Check `nginx.conf`: Ensure `proxy_set_header X-Forwarded-Proto` relies on the HTTP dynamically captured from Envoy, NOT a hard-coded `$scheme` string. - -### 2. WebSocket Hanging / HTTP Request Failing -**Symptoms**: Normal API calls return strange transport errors, or WebSocket voice channels refuse to upgrade. -**Root Cause**: Misconfigured HTTP `Upgrade` logic. Setting `Connection "upgrade"` unconditionally on an Nginx location breaks normal HTTP REST calls. -**Verification Check**: -* Check `nginx.conf` mappings. It must have: - ```nginx - map $http_upgrade $connection_upgrade { - default upgrade; - '' close; - } - ``` - ... and passed safely into the `proxy_set_header Connection $connection_upgrade;` field. - -### 3. "Port is already allocated" (Container Fails to Deploy) -**Symptoms**: Backend loads normally, but Frontend acts as `dead` or `Exit 1` silently during `remote_deploy.sh`. -**Root Cause**: Binding collisions on the production host `192.168.68.113`. (e.g., trying to bind Nginx to host port `8000` when another container `python3_11` uses it). -**Verification Check**: -* Run `docker ps -a` on `192.168.68.113`. -* Ensure the frontend port (`8002:80`) mapped inside `docker-compose.yml` does not overlap with any live container binding arrays. - -### 4. Envoy Cluster Sync Outages -**Symptoms**: `curl -v https://ai.jerxie.com` dumps a generic 404 or `503 Service Unavailable` with `server: envoy`. -**Root Cause**: The Envoy FilterChain (Listener SNI Map) doesn't trace back to a correct, valid Docker IP:Port allocation. -**Verification Check**: -* Query the Control Plane API: `curl -s http://192.168.68.90:8090/get-cluster?name=_ai_unified_server`. -* Make sure `portValue` in the JSON Endpoint equates to the one published in `docker-compose.yml` (`8002` vs `8000`). If mismatched, you must format a JSON package and `POST` it to `/add-cluster` utilizing the EnvoryControlPlane workflow.