# Code Review Report: Feature 15 — Task Journalism & Memory Sandboxing

This report performs a deep-dive audit of the task-tracking state machine and memory protection layer within `journal.py`, focusing on **12-Factor App Methodology**, **Memory Management**, and **Agent Reliability**.

---

## 🏗️ 12-Factor App Compliance Audit

| Factor | Status | Observation |
| :--- | :--- | :--- |
| **VI. Processes** | 🔴 **Major Issue** | **Volatile Journal State**: The `TaskJournal` stores all active task metadata and stream buffers in-memory (`self.tasks`, Line 20). If the Hub process restarts (deployment/crash), all currently running agent sub-tasks lose their "result hook." The AI Agent will continue waiting indefinitely for a result that disappeared from the Hub's memory. This state must be synchronized to a persistent store (SQLite/Redis). |
| **IX. Disposability** | ✅ **Success** | **Robust Memory Sandboxing**: The Hub's "Head + Tail" buffer strategy (`_trim_stream`, Line 41) is a best-in-class implementation for agentic systems. It prevents the Hub from OOM-crashing during accidental massive stdout bursts while preserving the critical initial context and final status needed by the AI. |

---

## 🔍 File-by-File Diagnostic

### 1. `app/core/grpc/core/journal.py`
The Hub's short-term memory for tracking asynchronous node execution.

> [!TIP]
> **Performance: Thread Safety vs. Throughput**
> Line 19: `self.lock = threading.Lock()`
> The journal uses a single global lock for all task updates (thought logs, stdout chunks, result fulfillment). For a mesh of 100+ nodes streaming build logs, this lock will become a significant point of contention.
> **Fix**: Shard the task registry (e.g., 16 separate dictionaries with their own locks) based on the `task_id` hash to improve concurrent update performance.

**Identified Problems**:
*   **Result Polling Latency**: The `cleanup` task (Line 216) removes completed results after only 120 seconds. If a calling service (like the UI or a background RAG aggregator) fails to poll exactly in that window due to network latency, the result is lost.
*   **Lack of Disk Spilling**: The journal is purely RAM-based. While the head+tail buffer limits individual tasks to ~40KB, 1,000 concurrent tasks still consume ~40MB. For high-volume agent clusters, a "Spill-to-Disk" strategy for inactive task buffers would be safer.

---

## 🛠️ Summary Recommendations

1.  **Persistent Task Index**: Record the `task_id` and assigned `node_id` in the backend Database upon registration to enable "Re-attachment" logic after a Hub reboot.
2.  **Sharded Locking**: Move from a single global lock to a sharded lock architecture to support high-frequency token streaming from massive agent clusters.
3.  **Configurable Stream Limits**: Move the 40KB hardcoded stream limits to `app/config.py` to allow tuning for specific AI model context windows.

---

**This concludes Feature 15. I have persisted this report to `/app/docs/reviews/feature_review_task_journal.md`. All major gRPC core components have now been audited. Shall I proceed to the final review of the mesh "Assistant" and STT/STT providers?**