diff --git a/docs/features/harness_engineering/co_pilot_agent_design.md b/docs/features/harness_engineering/co_pilot_agent_design.md
index 18521ef..d54d8d5 100644
--- a/docs/features/harness_engineering/co_pilot_agent_design.md
+++ b/docs/features/harness_engineering/co_pilot_agent_design.md
@@ -127,8 +127,22 @@
 ---
 
 ## 7. Implementation Checklist
-- [ ] Add `co_worker_enabled` toggle to `AgentInstance` DB model.
-- [ ] Implement `pre_run` and `post_run` hooks in `AgentExecutor`.
-- [ ] Add UI slider for `rework_threshold`.
-- [ ] Develop the logic to purge/write/read the `evaluation.md` in the workspace.
-- [ ] Create a "Self-Improvement Loop" that re-dispatches the Main Agent with Co-Worker feedback.
+
+### 🟢 Stage 1: Data & Models (Foundation)
+- [ ] **DB Model Update**: Modify `AgentInstance` DB model to include `co_worker_enabled`, `rework_threshold` (0-100), and `max_reworks` (int).
+- [ ] **Workspace Mirroring**: Implement `.cortex/` path creation and JSON `history.log` initialization in the Agent's filesystem jail.
+
+### 🟢 Stage 2: Orchestration Logic (The Engine)
+- [ ] **Pre-Run Analytic Turn**: implement a hook in `agent_loop.py` to prompt the Co-Pilot to generate a request-specific `rubric.md`.
+- [ ] **Stage 2A (Stateless Evaluation)**: Call Co-Pilot with stripped context to generate an objective numerical score.
+- [ ] **Stage 2B (Delta Discovery)**: Call Co-Pilot with score-anonymized history to generate gap feedback for the Main Agent.
+- [ ] **Rework Loop**: Modify `AgentExecutor` to handle recursive re-triggering based on the score vs. threshold comparison.
+
+### 🟢 Stage 3: UI & Monitoring (Dashboard)
+- [ ] **Config UI**: Update `DeployAgentModal.tsx` and `AgentDrillDown` settings with HSL threshold sliders.
+- [ ] **Evaluation Tab**: Build a dedicated tab in `AgentDrillDown` to stream `feedback.md` and `history.log`.
+- [ ] **Live Status Badges**: Add "Evaluating..." and "Last Quality Score" indicators to the agent management dashboard.
+
+### 🟢 Stage 4: Testing & Safety
+- [ ] **Blind Context Audit**: Verify that the Co-Pilot in Stage 2A receives zero knowledge of previous rounds.
+- [ ] **Loop Breaker Test**: Ensure `max_reworks` correctly stops an infinite implementation loop.
diff --git a/docs/features/harness_engineering/co_pilot_task_list.md b/docs/features/harness_engineering/co_pilot_task_list.md
new file mode 100644
index 0000000..8630592
--- /dev/null
+++ b/docs/features/harness_engineering/co_pilot_task_list.md
@@ -0,0 +1,40 @@
+# Task List: Co-Pilot Agent Harness Implementation
+
+This document tracks the progress of the autonomous evaluation and self-improvement loop for the Cortex Hub agents.
+
+## 🟢 Stage 1: Data & Models (Foundation)
+- [ ] **DB Model Update**: Modify the backend `AgentInstance` model (PostgreSQL/MongoDB as applicable) to include:
+    - `co_worker_enabled`: (Boolean) Default: `False`.
+    - `rework_threshold`: (Integer) Range 0-100. Default: `80`.
+    - `max_rework_count`: (Integer) Default: `3`.
+- [ ] **Workspace Mirroring**:
+    - [ ] Create `.cortex/` directory in the agent's unique jail during initialization.
+    - [ ] Implement `history.log` append logic (JSON format).
+
+## 🟢 Stage 2: Orchestration Logic (The Engine)
+- [ ] **Request-Specific Rubric Generator**:
+    - [ ] Implement a pre-execution hook in `agent_loop.py`.
+    - [ ] Prompt the Co-Pilot to generate a task-specific `rubric.md`.
+- [ ] **Dual-Stage Post-Run Hook**:
+    - [ ] **Stage 2A (Blind Rating)**: Implement gRPC/Executor logic to call the Co-Pilot with a stripped context.
+    - [ ] **Stage 2B (Delta Analysis)**: Implement context-aware gap discovery (Score-Anonymized).
+- [ ] **Recursive Execution Logic**:
+    - [ ] Logic in `AgentExecutor` to recursively re-trigger if `Score < Threshold` and `Reworks < Max`.
+
+## 🟢 Stage 3: User Interface (Dashboard)
+- [ ] **Agent Config Tab**:
+    - [ ] Add the "Co-Worker Settings" section to `DeployAgentModal.tsx`.
+    - [ ] Implement HSL-styled sliders for threshold and count.
+- [ ] **Evaluation Tab (`AgentDrillDown`)**:
+    - [ ] Create a real-time markdown renderer for `.cortex/feedback.md`.
+    - [ ] Build a "Rework History" component that visualizes `history.log` JSON data.
+- [ ] **Status Badges**:
+    - [ ] Display "Evaluating..." state on the agent card during post-run turns.
+    - [ ] Show a permanent "Quality Score" badge (Green/Yellow/Red) derived from the last log entry.
+
+## 🟢 Stage 4: Reliability & Testing
+- [ ] **Integration Tests**:
+    - [ ] Test: A task that fails on attempt 1, reworks, and passes on attempt 2.
+    - [ ] Test: A task that reaches `max_reworks` and stops even if score is still low.
+- [ ] **Bias Validation**:
+    - [ ] Audit logs to ensure Stage 2A truly receives zero context of previous rounds.