diff --git a/.agent/workflows/deploy_to_production.md b/.agent/workflows/deploy_to_production.md index 1f4a104..1552b54 100644 --- a/.agent/workflows/deploy_to_production.md +++ b/.agent/workflows/deploy_to_production.md @@ -50,8 +50,14 @@ ## Deployment Steps 1. **Automated Secret Fetching**: The `scripts/remote_deploy.sh` script will automatically pull the production password from the GitBucket Secret Vault if the `GITBUCKET_TOKEN` is available in `/app/.env.gitbucket`. -2. **Sync**: Sync local codebase to `/tmp/cortex-hub/` on the server. -3. **Proto & Version Management**: +2. **Environment Configuration**: Because the application defaults to an open/local configuration (Day 0 loopback mode) without OIDC, it is critical that the production `.env` file is populated with the correct authenticating variables if using SSO: + - `OIDC_CLIENT_ID=cortex-server` + - `OIDC_CLIENT_SECRET=change-me-at-runtime` + - `OIDC_SERVER_URL=http://localhost:8080` + - `OIDC_REDIRECT_URI=http://localhost:8002/api/v1/users/login/callback` + *(Ensure these are mapped in your production deployment toolchain or manually placed on the server under `/home/coder/project/cortex-hub/.env`)* +3. **Sync**: Sync local codebase to `/tmp/cortex-hub/` on the server. +4. **Proto & Version Management**: - If `ai-hub/app/protos/agent.proto` has changed, regenerate stubs: ```bash cd /app/ai-hub/app/protos && python3 -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. agent.proto @@ -59,8 +65,8 @@ ``` - **Triggering Client Updates**: To force all active agents (like the Mac Mini) to pull new code or skills, bump the version in `/app/agent-node/VERSION`. - **Auto-Update**: Once deployed, agents check for version mismatches every 5 minutes and will automatically re-bootstrap themselves if a newer version is detected on the Hub. -4. **Migrate & Rebuild**: Overwrite production files and run `bash scripts/local_rebuild.sh` on the server. -5. **Post-Deployment Health Check**: Perform a backend connectivity check (Python Trick). Only use `/frontend_tester` as a last resort if UI behavior is suspect. +5. **Migrate & Rebuild**: Overwrite production files and run `bash scripts/local_rebuild.sh` on the server. +6. **Post-Deployment Health Check**: Perform a backend connectivity check (Python Trick). Only use `/frontend_tester` as a last resort if UI behavior is suspect. ### Automated Command ```bash diff --git a/docker-compose.yml b/docker-compose.yml index 00e1ad3..9fd897c 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -30,15 +30,12 @@ - HUB_API_URL=http://localhost:8000 - HUB_PUBLIC_URL=http://localhost:8002 - HUB_GRPC_ENDPOINT=localhost:50051 - - OIDC_CLIENT_ID=cortex-server - - OIDC_CLIENT_SECRET=change-me-at-runtime - - OIDC_SERVER_URL=http://localhost:8080 # Placeholder for generic setup - - OIDC_REDIRECT_URI=http://localhost:8002/api/v1/users/login/callback - - SUPER_ADMINS=admin@example.com - - SECRET_KEY=default-insecure-key + - SUPER_ADMINS=${SUPER_ADMINS:-admin@example.com} + - SECRET_KEY=${SECRET_KEY:-default-insecure-key} - DEBUG_GRPC=true volumes: - ai_hub_data:/app/data:rw + - ./config.yaml:/app/config.yaml:rw - ./agent-node:/app/agent-node-source:ro - ./skills:/app/skills:ro - browser_shm:/dev/shm:rw diff --git a/docs/auth_tls_journey.md b/docs/auth_tls_journey.md new file mode 100644 index 0000000..fee1d33 --- /dev/null +++ b/docs/auth_tls_journey.md @@ -0,0 +1,105 @@ +# System Securitization Journey: Day 1 to Day 2 + +This document outlines the full user journey and technical implementation plan for securing the **AI Hub** and its **Swarm Control** system. It focuses on making the initial experience (Day 1) as seamless as possible, while providing a clear path to production-grade security (Day 2) without losing access or data. + +## 🚨 Quick Start UX / Feasibility Blockers (Pre-Release Phase) + +Before proceeding with the Day 1 to Day 2 transition, we must unblock the following critical UX flaws in the "Day 0" setup. + +### 1. The `.env` Override Trap (Critical Failure) +* **The Problem:** `docker-compose.yml` hardcodes environment variables like `SUPER_ADMINS` into the `environment:` block, which overrides the `.env` file generated by `setup.sh`. +* **The Fix:** Update `docker-compose.yml` to use variable interpolation with fallbacks (e.g., `- SUPER_ADMINS=${SUPER_ADMINS:-admin@local.host}`) so the `.env` file takes precedence. + +### 2. The "Empty Shell" Problem (No Agent Nodes) +* **The Problem:** AI Hub requires at least one node to be useful. If a user runs `docker-compose up`, the Hub starts, but there are zero agent nodes connected. The user is required to manually provision a node to do anything useful. +* **The Fix:** Include a pre-configured `sandbox-node` directly in the `docker-compose.yml` that automatically performs the token handshake with the Hub on startup. + +### 3. The "Brain Dead" State (Missing LLM Keys) +* **The Problem:** If a user bypasses OIDC, logs in, and goes to the Chat interface, the AI will crash because there are no LLM provider keys configured. +* **The Fix:** + 1. Update `setup.sh` to optionally accept a pre-filled configuration file (`./setup.sh my_config.yaml`). The script copies this file to `./config.yaml` and mounts it into the backend via docker-compose so the provider keys are instantly available. + 2. Update the frontend UI to display a friendly onboarding splash screen (*"Welcome to Cortex! Please configure your first AI Provider..."*) routing them to the settings page if no active providers are found (if they didn't pass in a config file). + +--- + +## 🌅 Day 0 / Day 1: Seamless Initialization (Interactive Setup script) + +### 1. One-Line Setup Script (`setup.sh`) +**User Journey:** +* The administrator runs a one-line install command published in the repository (e.g., `bash <(curl -s https://.../setup.sh)` or `./setup.sh`). +* The script acts as an interactive wizard asking for the bare-minimum required values: + - **Admin Email** (maps to `SUPER_ADMINS`). +* The script automatically generates a highly secure `SECRET_KEY` and a random initial **Admin Password**. +* It saves all this to a `.env` file and starts the application via `docker-compose up -d`. +* The script prints a clean summary to the terminal: + ``` + ✅ AI Hub is up and running! + 🌐 Access the Hub UI: http://localhost:8002 (or your server IP) + 👤 Admin Email: admin@example.com + 🔑 Initial Password: [Generated Password] + ``` + +### 2. Web UI Access (Local Auth Fallback) +**User Journey:** +* The administrator visits the URL provided by the setup script. +* They log in using the generated local credentials. No OIDC configuration is required. +* Once logged in, they can access the platform and can optionally change their password in their Profile settings. + +**Implementation TODOs:** +* [ ] **DB Model**: Update `User` model (`app/db/models/user.py`) to include a nullable `password_hash` column. +* [ ] **Config**: Set OIDC settings as optional and disabled by default in `Settings` (`app/config.py`), adding an `oidc_enabled: bool` flag. +* [ ] **Backend Gen**: In `save_user` or the initialization hook, if forming the initial super admin, generate a random password, hash it, save it, and log the plain text to stdout. +* [ ] **API Routes**: Create local login endpoints (`POST /api/v1/users/login/local` and `PUT /api/v1/users/password`). +* [ ] **Frontend**: Redesign the Login page to show a local Username/Password form by default. + +### 3. Swarm Control (gRPC Insecure Mode & Missing Hostname) +**User Journey:** +* The user wants to test remote agent nodes (e.g., their laptop). +* Because TLS isn't configured out of the box, the Hub's gRPC Orchestrator starts on an insecure binding (`add_insecure_port`). +* Furthermore, because the Hub doesn't know its external DNS name/IP, the generated provisioning script defaults to pointing nodes at `localhost:50051`. +* **The Policy:** We *allow* this behavior to enable rapid local testing. However, the Web UI clearly warns the user with two banners on the Swarm dashboard: + 1. ⚠️ **"Swarm operating in Insecure Mode. Traffic is not encrypted."** + 2. ⚠️ **"Hub External Endpoint is not configured. Node provisioning scripts will default to 'localhost' and will require manual IP adjustment if deploying to remote machines."** +* The user can connect nodes locally, or manually substitute their LAN IP, and test the system immediately. + +--- + +## 🛡️ Day 2: Production Securitization & Hostname Configuration +**TODOs:** +* [ ] **Config**: Add `GRPC_TLS_ENABLED: bool` to settings, defaulting to `False`. +* [ ] **Backend**: Ensure `app/core/grpc/services/grpc_server.py` cleanly handles the insecure port bound to `GRPC_ENDPOINT` when `GRPC_TLS_ENABLED` is false. +* [ ] **Nodes API / UI**: Expose the TLS status via an API so the frontend dashboard can render a warning banner about insecure swarm communication. + +--- + +## 🛡️ Day 2: Production Securitization + +### 1. Transitioning to Single Sign-On (OIDC) +**User Journey:** +* Once the user is comfortable and ready to deploy to a team, the administrator navigates to the **Authentication Settings** page in the UI. +* They input their OIDC provider details (Client ID, Secret, Auth URL, etc.) and toggle **Enable SSO**. +* From then on, the login screen displays a prominent "Log in with SSO" button. +* **Critical Phase**: When the admin logs in via SSO, the system extracts their email from the JWT payload. Since it matches the local admin account, the system safely links the `oidc_id` to the existing account. The admin does not randomly lose access or get a duplicated profile. + +**Implementation TODOs:** +* [ ] **OIDC Settings API**: Create `PUT /api/v1/admin/config/oidc` to update OIDC details and write them persistently to `config.yaml`. +* [ ] **Auth Linking**: Update `app/core/services/auth.py` (`handle_callback`) to search for an existing user by `email` before creating a new account. If found, link the OIDC `sub` to the existing local user. +* [ ] **Frontend Login**: The login screen queries `/api/v1/auth/config` to detect if OIDC is enabled and conditionally renders the OIDC redirect button alongside the local fallback. + +### 2. Encrypting Swarm Control & Setting Hostnames (gRPC TLS) +**User Journey:** +* The admin decides to connect agent nodes located in different cloud providers or over the public internet. +* They navigate to the **Swarm/Node Settings** page. +* They define the `EXTERNAL_GRPC_ENDPOINT` (e.g., `ai.example.com:50051`). +* They upload or provide paths to a valid SSL/TLS Certificate (`cert.pem`) and Private Key (`key.pem`) and toggle **Enable TLS for Swarm Control**. +* **The Transition (Disconnect):** The UI displays a loud warning prompt: *"Changing the Endpoint or enabling TLS will drop all currently connected nodes. You will need to manually download the new configuration for each node and restart them."* +* The Hub restarts its gRPC server bound to a secure port (`add_secure_port`). +* Existing nodes disconnect because the insecure channel is closed or network routes change. +* The user navigates to their nodes in the UI, clicks "Download Config Bundle", and pushes the updated secure configuration to their remote machines, completing the secured Day 2 journey. + +**Implementation TODOs:** +* [ ] **Config**: Add fields for `GRPC_EXTERNAL_ENDPOINT`, `GRPC_CERT_PATH`, and `GRPC_KEY_PATH`. +* [ ] **Backend gRPC Server**: Update `serve_grpc` in `grpc_server.py`. If `GRPC_TLS_ENABLED` is true, read the certs and use `server.add_secure_port()`. +* [ ] **API / Settings UI**: Create an admin endpoint (`PUT /api/v1/admin/config/swarm`) to define external endpoints and TLS certificates. +* [ ] **Node Provisioning**: Update the script generator (`_generate_node_config_yaml`) to use `GRPC_EXTERNAL_ENDPOINT` (fallback to localhost) and instruct the python client to use `grpc.ssl_channel_credentials()` when TLS is active. +* [ ] **UI Banners**: Add the "Insecure Mode" and "Missing Hostname" warning banners to the Swarm Dashboard frontend. diff --git a/docs/auth_tls_todo.md b/docs/auth_tls_todo.md new file mode 100644 index 0000000..653fb2a --- /dev/null +++ b/docs/auth_tls_todo.md @@ -0,0 +1,37 @@ +# Cortex Hub: Authentication & TLS Implementation Tracking + +This document serves as the master checklist for implementing the Day 1 / Day 2 Securitization Journey. + +## Phase 1: Pre-Release UX Blockers +These tasks ensure the out-of-the-box local developer experience functions fully. +- [x] **Infrastructure**: Fix the `.env` override trap in `docker-compose.yml`. +- [x] **Setup Script**: Update `setup.sh` to accept optional `config.yaml` to cure the "Brain Dead" state. +- [ ] **Infrastructure**: Include a bundled `sandbox-node` container in `docker-compose.yml` so the Hub isn't an "Empty Shell" on startup. + +## Phase 2: Day 1 Local Auth Fallback +Enable local authentication using the `CORTEX_ADMIN_PASSWORD` generated by the setup script. +- [ ] **Database Model**: Update `User` model (`app/db/models/user.py`) to include a nullable `password_hash` column. +- [ ] **Configuration**: Update `Settings` (`app/config.py`) to make OIDC settings optional and add an `oidc_enabled: bool` flag. +- [ ] **Backend Initialization**: If `CORTEX_ADMIN_PASSWORD` is present in the environment for the `SUPER_ADMINS` initialization, hash it and assign it to the admin account. +- [ ] **API Routes**: Create local login endpoints (`POST /api/v1/users/login/local` to issue JWTs) and (`PUT /api/v1/users/password` for password resets). +- [ ] **Frontend**: Redesign the Auth/Login page to display a Username/Password default form. + +## Phase 3: Day 1 Swarm Control (Insecure/Local Status) +Support running the mesh over internal loopbacks but strictly warn the end-user. +- [ ] **Backend Configuration**: Add `GRPC_TLS_ENABLED`, `GRPC_EXTERNAL_ENDPOINT` to `config.py`. +- [ ] **Backend API**: Expose a `/api/v1/status` or equivalent endpoint providing the current TLS/Hostname state to the frontend. +- [ ] **Frontend UI**: Add persistent "Insecure Mode" and "Missing External Hostname" warning banners to the Swarm Dashboard frontend when running in Day 1 mode. + +## Phase 4: Day 2 Single Sign-On (OIDC Linking) +Allow transition to Enterprise SSO without breaking or duplicate accounting. +- [ ] **Backend Service**: Update `app/core/services/auth.py` (`handle_callback`) to search for existing local users via `email` and safely link the incoming OIDC `sub` payload. +- [ ] **Admin API**: Create `PUT /api/v1/admin/config/oidc` for UI-based toggling and configuration of SSO parameters without restarting. +- [ ] **Frontend Login**: Dynamically query `/api/v1/auth/config`. If enabled, render the "Log in with SSO" button instead of or alongside local Auth. +- [ ] **Frontend Settings**: Create an Admin Settings UI panel for OIDC Configuration. + +## Phase 5: Day 2 Swarm Control (Encrypted Mesh) +Wrap the gRPC nodes with SSL definitions. +- [ ] **Backend gRPC Server**: Update `serve_grpc` in `app/core/grpc/services/grpc_server.py`. If `GRPC_TLS_ENABLED`, load generic server certs and call `server.add_secure_port()`. +- [ ] **Admin API**: Create `PUT /api/v1/admin/config/swarm` for UI-based configuration of `GRPC_EXTERNAL_ENDPOINT` and SSL Cert paths. +- [ ] **Node Provisioning**: Update the script generator `_generate_node_config_yaml` (`app/api/routes/nodes.py`) to inject the `GRPC_EXTERNAL_ENDPOINT` and toggle `grpc.ssl_channel_credentials()` for python clients. +- [ ] **Frontend Settings**: Add a prompt to the Settings UI stating that *"Updating SSL/Endpoint will disconnect all nodes!"* upon saving. diff --git a/setup.sh b/setup.sh new file mode 100644 index 0000000..d5a5c20 --- /dev/null +++ b/setup.sh @@ -0,0 +1,69 @@ +#!/bin/bash +# Cortex Hub Setup Script + +set -e + +echo "==========================================" +echo "⚡ Cortex Hub - Interactive Setup Wizard" +echo "==========================================" +echo "" + +# Optional Day 0 Prefill config +CONFIG_FILE=$1 + +if [ -n "$CONFIG_FILE" ] && [ -f "$CONFIG_FILE" ]; then + echo "📄 Utilizing provided configuration file: $CONFIG_FILE" + cp "$CONFIG_FILE" ./config.yaml +else + if [ ! -f "config.yaml" ]; then + touch config.yaml + fi + echo "💡 No prefill config file provided. An empty config.yaml will be initialized." + echo " (You can configure LLM providers later in the Web UI)." + echo " To prefill, run: ./setup.sh my_config.yaml" +fi +echo "" + +# 1. Prompt for Admin Email +read -p "Enter Admin Email (e.g., admin@example.com): " ADMIN_EMAIL +if [ -z "$ADMIN_EMAIL" ]; then + echo "❌ Error: Admin Email is required." + exit 1 +fi + +# 2. Auto-generate Secrets +echo "" +echo "Generating secure keys..." +SECRET_KEY=$(openssl rand -hex 32) +ADMIN_PASSWORD=$(openssl rand -base64 12) + +echo "Creating .env file..." +cat < .env +SUPER_ADMINS=${ADMIN_EMAIL} +CORTEX_ADMIN_PASSWORD=${ADMIN_PASSWORD} +SECRET_KEY=${SECRET_KEY} +EOF + +# 3. Bring up Docker Services +echo "Starting services via Docker Compose..." +if command -v docker-compose &> /dev/null; then + docker-compose up -d --build +elif docker compose version &> /dev/null; then + docker compose up -d --build +else + echo "❌ Error: Docker Compose is not installed." + echo "Please install it and run 'docker-compose up -d'" + exit 1 +fi + +echo "" +echo "==========================================" +echo "✅ Setup Complete!" +echo "==========================================" +echo "" +echo "🌐 Access the Hub UI: http://localhost:8002 (or your server IP)" +echo "👤 Admin Email: ${ADMIN_EMAIL}" +echo "🔑 Initial Password: ${ADMIN_PASSWORD}" +echo "" +echo "Please save your password safely! You can change it later in the UI." +echo "=========================================="