# Code Review Report: Feature 14 — Global Work Pool & Task Discovery

This report performs a deep-dive audit of the Hub's "claim-based" task distribution engine within `pool.py`, focusing on **12-Factor App Methodology**, **Distributed Reliability**, and **Task Lifecycle Safety**.

---

## 🏗️ 12-Factor App Compliance Audit

| Factor | Status | Observation |
| :--- | :--- | :--- |
| **VI. Processes** | 🔴 **Major Issue** | **Volatile Task Memory**: The `GlobalWorkPool` stores unclaimed tasks in-memory only (`self.available`, Line 8). If the Hub process restarts (deployment/update/crash), all pending global work is lost. This makes the system unreliable for scheduled or background "autonomous" tasks. |
| **IX. Disposability** | ✅ **Success** | **Thread Safety**: Access to the work pool is correctly gated by a `threading.Lock()` (Line 7), ensuring that no two nodes can claim the same task simultaneously during intensive broadcast bursts. |

---

## 🔍 File-by-File Diagnostic

### 1. `app/core/grpc/core/pool.py`
The Hub's "Job Board" where tasks are listed for Mesh nodes to claim.

> [!CAUTION]
> **Task Loss Hazard (No Visibility Timeout)**
> Line 36: `return True, self.available.pop(task_id)["payload"]`
> The `claim` method is globally destructive. Once a node claims a task, it is immediately removed from the pool's memory.
> 
> **The Problem**: If the node claims the task but crashes before completing it (power failure, network loss), the task is **lost forever**. The Hub provides no mechanism to re-assign or time-out "active" tasks that never report back.
> 
> **Fix**: Replace the simple `pop` with a status-based registry (`available` vs `in_progress`). Implement a "Visibility Timeout" (e.g., 5 minutes) where tasks move back to `available` if the claiming node's gRPC stream terminates without success.

**Identified Problems**:
*   **Hardcoded Cleanup TTL**: The 3600-second (1 hour) expiration for tasks (Line 18) is hardcoded and might be too aggressive for low-priority agent tasks that require specific node availability that might be offline for maintenance.
*   **Lack of Priority Support**: The pool is a flat dictionary. In a large mesh, "Admin" tasks or "Security Patches" should be prioritizable over standard background "RAG ingestion" tasks.

---

## 🛠️ Summary Recommendations

1.  **Implements Persistence**: Migrate the `available` tasks map to a Redis `HASH` or SQLite `tasks` table to ensure job persistence across Hub reboots.
2.  **Add Visibility Timeouts**: Track which `node_id` is currently processing a task and implement a reaper task to re-enque any tasks that exceed a specific processing TTL.
3.  **Task Priority**: Update `available` to be a `PriorityQueue` or implement a numeric `priority` field to ensure critical mesh commands are processed first.

---

**This concludes Feature 14. I have persisted this report to `/app/docs/reviews/feature_review_work_pool.md`. Shall I implement a basic task-reaper to prevent task loss during node crashes?**
