# Code Review Report: Feature 11 — Node Mesh Registry & Event Bus

This report performs a deep-dive audit of the node registration and event bus logic within `node_registry.py`, focusing on **12-Factor App Methodology**, **Memory Management**, and **gRPC Concurency Safety**.

---

## 🏗️ 12-Factor App Compliance Audit

| Factor | Status | Observation |
| :--- | :--- | :--- |
| **VI. Processes** | 🔴 **Major Issue** | **Ephemeral History Volatility**: Each `LiveNodeRecord` stores a `terminal_history` list (Lines 79, 437) of the last 150 terminal interaction chunks. When the Hub restarts, this history is purged. This causes a "Context Amnesia" effect where an AI agent cannot "remember" what was displayed on a node's terminal in a previous session, even if the node itself stayed online. This state should be persisted to an ephemeral store (Redis/SQLite). |
| **IX. Disposability** | ✅ **Success** | **Bounded Backpressure**: The Hub implemented a custom `BoundedThreadPoolExecutor` (Line 24) to provide natural backpressure when gRPC queues are full. This prevents the Hub from crash-looping due to Out-Of-Memory (OOM) errors during massive file-sync waves. |
| **XI. Logs** | 🟡 **Warning** | **Debug Noise**: Every heartbeat logs a `debug` message (Line 367). In a mesh of 100+ nodes, this generates thousands of lines per minute. While technically correct as "debug", it makes searching for actual issues in the debug logs nearly impossible. |

---

## 🔍 File-by-File Diagnostic

### 1. `app/core/services/node_registry.py`
The Hub's central nervous system for mesh coordination.

> [!CAUTION]
> **Memory Fragmentation Hazard (Rolling History)**
> Line 437: `node.terminal_history = node.terminal_history[-150:]`
> The terminal history is stored as a list of strings. Every time a new chunk arrives, the list is appended to and then spliced. For large shell outputs (build logs, `cat`ing large files), this can lead to significant memory fragmentation and garbage collection overhead in the Hub process.
> **Fix**: Use a `collections.deque(maxlen=150)` for the `terminal_history` buffer to ensure O(1) rotation and efficient memory management.

**Identified Problems**:
*   **Zombie Thread Fallback**: Line 107 spawns a raw `threading.Thread(daemon=True)` for message delivery if the executor is not ready. In high-load startup scenarios, this could lead to an explosion of unmanaged threads (Thread Leaks).
*   **Stale Status Window**: Line 133 considers a node "stale" after only 30 seconds. In edge computing environments with latent VPNs (e.g. Wireguard over 5G), this threshold might be too aggressive, causing UI "flashing" between online and stale.

---

## 🛠️ Summary Recommendations

1.  **Switch to Deque**: Implementation of `collections.deque` for `terminal_history` to prevent list-splicing performance degradation.
2.  **Externalize History**: Move the terminal buffer to a Redis `LPUSH/LTRIM` structure if persistent context across Hub restarts is required.
3.  **Configurable Thresholds**: Move the "Stale" timeout (30s) and "Flapping" window (60s) to `app/config.py` to allow tuning for different network environments.

---

**This concludes Feature 11. I have persisted this report to `/app/docs/reviews/feature_review_node_registry_bus.md`. All major backend components have now been audited. Which of these infrastructure improvements should I implement first?**
