Harness Engineering (AI Orchestrator) is a transformative new layer within the platform that evolves our one-on-one AI interactions into a collaborative, automated, multi-agent ecosystem.
Currently, Swarm Control provides a powerful but manual one-to-one developer interface: a user initiates a session, configures nodes, issues prompts, and watches the terminal execution.
Harness Engineering takes the Swarm Control concept and fully encapsulates it. Each "session" of Swarm Control is wrapped into an autonomous entity called an Agent. Multiple Agents can run concurrently, wait in the background for events, wake up, execute tasks utilizing their dedicated Swarm features, communicate with each other, and go back to sleep—enabling infinite, collaborative execution to achieve complex user requests.
Every Agent is fundamentally an extended instance of a Swarm Control session, but with persistence, defined roles, and event-driven automation.
.md file. By simply editing this Markdown file, developers can customize the exact behavior of the Agent (e.g., "QA Automator", "Code Reviewer", "Database Migrator").Each Agent inherits the full power of the existing Swarm Control configuration:
Idle state where they consume minimal resources, listening for a trigger.
To ensure the Orchestrator design is robust, it adopts key patterns from popular open-source multi-agent frameworks:
{
"handoff_id": "task_refactor_v1",
"source_agent": "Lead_Engineer_Agent",
"target_agent": "QA_Tester_Agent",
"status": "SUCCESS",
"artifacts": {
"working_dir": "/tmp/cortex/shared/refactor_delta_01",
"files_changed": ["src/auth.py", "tests/test_auth.py"],
"cli_entrypoint": "pytest tests/test_auth.py"
},
"summary_for_target": "Refactored JWT logic. Please verify the 401 Unauthorized edge cases."
}This tiny JSON object becomes the only initial context injected into the target Agent's empty session, saving massive tokens and ensuring a clean start..md templates, agents are assigned strict roles (e.g., Lead Engineer, QA Tester) and collaborate through shared state to achieve multi-step goals.The primary UI for Harness Engineering will pivot from a traditional chat interface to an orchestration dashboard.
🟢 Active, 🟡 Idle, 🔵 Listening, 🔴 Error).Webhook).Clicking on an Agent Card opens up the "Drill-Down" UI. Because Chat History is only 30% of an autonomous agent's story, this UI uses a Dual-Track Layout:
FileSystemNavigator.js React component and passes it a rootPath prop locking its view strictly to the Agent's Workspace Jail (e.g., /tmp/cortex/agent_A/). This gives instant physical inspection of the files the agent is modifying with zero new backend infrastructure.nginx system-wide), the terminal displays the native Mesh root prompt ([root@ubuntu-server-1 /]#). If the Agent is a concurrently Jailed Worker (e.g., just running pytest in isolation), it color-codes as a jail ([Agent_A@Jail-123 /src]$). This prevents the human user from accidentally typing a system-wide command when they are actually inside a jailed worker session.pip install loop failed before the agent eventually mitigated it.Agents operate autonomously based on conditions defined by the user in the UI, categorized into Active and Passive modes.
Active triggers are used for "Agent as a Service" background automation. They use a Fixed Automation Prompt that the agent executes on every wake-up.
0 * * * * for hourly).Passive triggers wake the agent when external systems push data. They use a Predefined Default Prompt as a fallback, which can be overridden by the incoming request payload.
Issue #{{payload.id}} was created: {{payload.content}}).As agents begin to natively Handoff tasks (passing JSON Manifests), they form a pipeline (e.g., Frontend Dev -> Backend Dev -> QA Reviewer). The UI provides a "Link View" visualizing these connections as edges between nodes. Real-time token flow and "Awaiting Dependencies" states are visualized here to help lead engineers spot pipeline bottlenecks instantly.
Goal: The user wants an Agent to automatically review code whenever a Pull Request is opened in their repository.
Deploy New Agent. They upload their customized github_reviewer.md persona and attach it to prod-mesh-node-1.https://ai.jerxie.com/webhooks/agents/123?token=abc).{{payload.pull_request.title}}".🔵 Listening status. The user pastes the URL into GitHub, and the journey is complete. The Agent will now wake up automatically when GitHub pushes traffic.Goal: The user wants to manually command an Agent that usually runs on a schedule.
Log_Archiver Agent on the dashboard. They want to archive logs right now instead of waiting for the cron job. They hit the Play Button on the Agent Card. The Hub triggers the agent with its Predefined Default Prompt ("Analyze and archive system logs").Log_Archiver to ignore syslog today and focus on nginx.log. They click the Agent Card, opening the Drill-Down UI. The user types into the chat box: "Ignore syslog today, only archive nginx.log." This specific manual request overrides the default prompt for this execution only.Since the existing application has a clean Backend API separation (likely leveraging FastAPI/Django for ai-hub and React for frontend), we can implement this robustly while maintaining flexibility.
The "Lightweight AI Flow" principle ensures we don't rewrite the wheel. To start, an Agent is simply a database record that references an existing Session ID.
POST /api/v1/sessions/:id/messages) to trigger the underlying Swarm execution.POST /api/v1/agents/{id}/trigger which takes a web payload and translates it into a message for that Agent's underlying Session.New tables/collections needed:
AgentTemplate: Path to the .md persona, default Swarm configs, default node connections.AgentInstance: A running version of a template, mapped 1:1 with a Session ID, tracking connection states and loop configuration.AgentTrigger: Configurations for hooks (url, secret) or cron schedules.To build this smoothly, we will prioritize a "Make it Work" MVP using our existing architecture, ensuring the API contract is solid. Once validated, we seamlessly swap the engine underneath to reach our ultimate scale.
Phase 1: The Monolithic MVP ("Make it Work")
BackgroundTasks (zero infrastructure sprawl).AgentInstance DB records. Route the background task to simply call existing endpoints: GET /nodes/{id}/terminal and POST /nodes/{id}/dispatch./tmp/{agent_name}/ directoris.Phase 2: The UI Dashboard & Persona Engine
AgentHarnessPage.js Card UI..md file as the system prompt for a Session. Allow users to click from the Dashboard directly into the pre-configured SwarmControlPage to visually spectate the background Agent working.Phase 3: The Engine Migration ("Scale it Up")
Phase 4: Agent Collaboration & Advanced Resilience
handoff_to_agent(target_agent_id, json_manifest)). By enforcing a JSON Manifest schema, we guarantee token efficiency and prevent "context poisoning" between disparate agents (e.g., Coder -> QA).By abstracting "Swarm Control" into "Agents", we modularize intelligence. The backend AI doesn't need to know if it's chatting with a human or triggered by GitHub; it just receives prompts and executes on the Mesh Nodes. This keeps the codebase incredibly DRY (Don't Repeat Yourself) while exponentially increasing the capabilities of the platform. We can iteratively refine the .md files without touching backend Python code, providing maximum flexibility for the future.
Unlike an interactive chat where a human can instantly see and correct an AI error, background Orchestrator loops require extreme self-healing and failure mechanisms to prevent runaway infrastructure or billing disasters. We engineer resilience at five layers:
Autonomous loops can easily get stuck retrying broken code, rapidly burning provider tokens (HTTP 429).
Max_Iterations: 20). If an Agent fails to complete its objective within 20 continuous tool calls, the Hub triggers a Circuit Breaker. The Agent halts, flips to 🔴 Error (Suspended), and instantly alerts the Dashboard for manual human intervention via the conversational drill-down. Network drops or OpenAI 502s are gracefully handled via strict exponential backoff (Tenacity).If the actual ai-hub Docker container restarts, or the Python worker runs out of RAM midway through a task, the DB will still say the agent is 🟢 Active, but no background thread is actually running it.
🟡 Idle. Because our Agents are stateless, the next worker simply reads the chat history and seamlessly resumes the loop exactly where it left off.An Agent in an infinite loop will rapidly exceed the LLM's 100k+ token window if it appends every thought and terminal output linearly.
[System Persona] + [Condensed Scratchpad] + [Last 10 Actions], ensuring it never crashes from token bloat.If multiple autonomous Agents are attached to the same physical node simultaneously, their shell commands might collide (e.g., Agent A deletes /tmp/data while Agent B is trying to zip it).
cd .. could escape a path lock). Instead, the MVP exclusively uses Global Node Locks (only one Agent can orchestrate a specific Mesh Node at a time). To achieve true concurrency later, we will use Workspace Jails. An Agent will be strictly confined by the Node's Sandbox policy to only write to its assigned runtime directory (e.g., /tmp/cortex/agent_A/). If an Agent needs to modify a global system config (like /etc/nginx/), it must explicitly escalate and halt all other Agents via a Global Node Lock.The current Swarm UI relies on in-memory WebSockets to display live terminal output. A background Agent might poll the node, but if the logs stream violently fast, crucial output could be dropped from RAM before the polling cycle hits.
~/.cortex/logs/{session_id}.log). The orchestrator natively instructs the Agent to read these concrete files for analysis rather than sniffing the live websocket buffer, guaranteeing zero data loss.If an Agent crashes halfway through downloading a dataset or creating a database table, the "Zombie Sweeper" will reset it and the Agent will retry. Without precautions, the Agent will immediately crash again because the git clone or CREATE TABLE command will throw a "Resource Already Exists" error.
While Rolling Memory Summary (Mechanism C) keeps the Agent below token limits, summarization inherently destroys precise detail. If an agent spent 5 turns fixing a massive regex on line 124 of a 5,000-line file, the summarizer might condense it to: "Agent reviewed file and fixed regex." Because the exact path and line number are lost from context, the Agent will suffer from "dementia" and have to re-discover its own work repeatedly in long loops.
.txt file on the node as its own hippocampus. It will actively write exact variables, paths, and immediate next steps to this physical file so that even if the Hub summarizes its chat history, its literal working memory is safely preserved and readable natively by the Agent on every loop tick.Running the orchestrator loop (compiling prompts, calling LLMs, interpreting outputs) for 10+ sub-agents directly inside the main ai-hub API server is fundamentally unscalable. A single Python FastAPI backend will quickly become I/O and CPU bound.
To achieve infinite, horizontal scale for the Orchestrator as usage grows, we strictly adhere to a decoupled Worker Fleet Architecture (Brains in the Cloud, Smart Hands on the Edge).
Running the orchestrator loop directly inside the main ai-hub API server will eventually become CPU/IO bound. Instead, we split the architecture:
BackgroundTasks in the MVP, up to Celery workers for Enterprise instances). These separate containers exclusively run the long-lived while loops, make heavy LLM API calls, and parse prompts.agent-node clients remain drastically lightweight. They run no LLM logic locally.To ensure the Worker Fleet remains furiously fast and doesn't buckle under network traffic:
remote_semantic_grep). The Worker instructs the Node to run the heavy file-parsing locally on its own CPU, and exactly 3 lines of dense, matched text are returned over the wire to the Worker.