diff --git a/.agent/workflows/deploy_to_production.md b/.agent/workflows/deploy_to_production.md index 3d52b12..f6f6515 100644 --- a/.agent/workflows/deploy_to_production.md +++ b/.agent/workflows/deploy_to_production.md @@ -9,13 +9,14 @@ 1. **Automated Secret Fetching**: The `scripts/remote_deploy.sh` script will automatically pull the production password from the GitBucket Secret Vault if the `GITBUCKET_TOKEN` is available in `/app/.env.gitbucket`. 2. **Sync**: Sync local codebase to `/tmp/cortex-hub/` on the server. -3. **Proto Regeneration**: If `ai-hub/app/protos/agent.proto` has changed, the agent must regenerate the Python stubs: - ```bash - cd /app/ai-hub/app/protos && python3 -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. agent.proto - # And for the agent node - cd /app/agent-node && python3 -m grpc_tools.protoc -Iprotos --python_out=. --grpc_python_out=. protos/agent.proto - ``` - > **CAVEAT**: Because you changed the protobuf interface, all clients must be updated as well! Remember to rebuild and restart all agent nodes (like `cortex-test-1` and `cortex-test-2`) whenever the `.proto` files change so they have the latest interface stubs. +3. **Proto & Version Management**: + - If `ai-hub/app/protos/agent.proto` has changed, regenerate stubs: + ```bash + cd /app/ai-hub/app/protos && python3 -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. agent.proto + cd /app/agent-node && python3 -m grpc_tools.protoc -Iprotos --python_out=. --grpc_python_out=. protos/agent.proto + ``` + - **Triggering Client Updates**: To force all active agents (like the Mac Mini) to pull new code or skills, bump the version in `/app/agent-node/VERSION`. + - **Auto-Update**: Once deployed, agents check for version mismatches every 5 minutes and will automatically re-bootstrap themselves if a newer version is detected on the Hub. 4. **Migrate & Rebuild**: Overwrite production files and run `bash scripts/local_rebuild.sh` on the server. 5. **Post-Deployment Health Check**: Perform a backend connectivity check (Python Trick). Only use `/frontend_tester` as a last resort if UI behavior is suspect. diff --git a/.agent/workflows/deployment_reference.md b/.agent/workflows/deployment_reference.md index f8d5926..c11ad5f 100644 --- a/.agent/workflows/deployment_reference.md +++ b/.agent/workflows/deployment_reference.md @@ -58,7 +58,6 @@ 3. Triggers Docker Compose with multiple layers: - `-f docker-compose.yml` (Generic common config) - `-f deployment/jerxie-prod/docker-compose.production.yml` (Jerxie-specific overrides) - - `-f deployment/test-nodes/docker-compose.test-nodes.yml` (Optional test nodes) 4. Rebuilds and starts the containers. 5. Performs automated database migrations running parallel idempotent logic via the `Uvicorn` startup lifecycle. @@ -74,7 +73,6 @@ * **`docker-compose.yml`**: Baseline configuration. Uses local Docker volumes and generic localhost endpoints. Default for new users. * **`deployment/jerxie-prod/`**: Contains Jerry's specific production overrides (NFS volume on `192.168.68.90`, SSL/OIDC endpoints for `ai.jerxie.com`). -* **`deployment/test-nodes/`**: Contains internal test node definitions for simulation. * **`scripts/`**: Centralized automation scripts for remote sync, local rebuilding, and node registration. For any automated AI attempting to debug or push changes: * **Primary Source of Truth**: This file (`.agent/workflows/deployment_reference.md`) defines the architecture rules. @@ -83,6 +81,21 @@ --- +## 5. Agent Node Auto-Update Architecture + +Cortex supports zero-touch maintenance for distributed agent nodes (like Mac Minis or remote VMs). + +### How it Works +1. **Version Monitoring**: The running Agent process (`agent_node/main.py`) runs a background `AutoUpdater` thread. +2. **Polling**: Every 300 seconds, the agent calls `GET /api/v1/agent/version` on the Hub using the `SECRET_KEY` for authentication. +3. **Trigger**: If the Hub reports a version higher than the one found in the local `VERSION` file, the agent: + - Downloads the latest code bundle (including `skills/`) from `GET /api/v1/agent/download`. + - Delegates to `bootstrap_installer.py --update-only` to extract and install new dependencies. + - Restarts itself (`os.execv`) to load the new code. +4. **Resiliency**: The installer uses `--ignore-installed` to prevent common Python/Anaconda metadata errors from stalling updates. + +--- + ## 4. Reference Troubleshooting & Known Pitfalls If production encounters a bug or routing fails, these are the historically primary offenders: @@ -119,4 +132,5 @@ **Root Cause**: The Envoy FilterChain (Listener SNI Map) doesn't trace back to a correct, valid Docker IP:Port allocation. **Verification Check**: * Query the Control Plane API: `curl -s http://192.168.68.90:8090/get-cluster?name=_ai_unified_server`. +* **Explicit Port Matching**: Envoy in production is configured to match both `ai.jerxie.com` and `ai.jerxie.com:443`. If a gRPC client (the Agent) includes the port in its authority header, Envoy must have it in its `virtual_hosts.domains` array or it will throw a 404. * Make sure `portValue` in the JSON Endpoint equates to the one published in `docker-compose.yml` (`8002` vs `8000`). If mismatched, you must format a JSON package and `POST` it to `/add-cluster` utilizing the EnvoryControlPlane workflow.