This report performs a deep-dive audit of the node registration and event bus logic within node_registry.py, focusing on 12-Factor App Methodology, Memory Management, and gRPC Concurency Safety.
| Factor | Status | Observation |
|---|---|---|
| VI. Processes | 🔴 Major Issue | Ephemeral History Volatility: Each LiveNodeRecord stores a terminal_history list (Lines 79, 437) of the last 150 terminal interaction chunks. When the Hub restarts, this history is purged. This causes a "Context Amnesia" effect where an AI agent cannot "remember" what was displayed on a node's terminal in a previous session, even if the node itself stayed online. This state should be persisted to an ephemeral store (Redis/SQLite). |
| IX. Disposability | ✅ Success | Bounded Backpressure: The Hub implemented a custom BoundedThreadPoolExecutor (Line 24) to provide natural backpressure when gRPC queues are full. This prevents the Hub from crash-looping due to Out-Of-Memory (OOM) errors during massive file-sync waves. |
| XI. Logs | 🟡 Warning | Debug Noise: Every heartbeat logs a debug message (Line 367). In a mesh of 100+ nodes, this generates thousands of lines per minute. While technically correct as "debug", it makes searching for actual issues in the debug logs nearly impossible. |
app/core/services/node_registry.pyThe Hub's central nervous system for mesh coordination.
[!CAUTION] Memory Fragmentation Hazard (Rolling History) Line 437:
node.terminal_history = node.terminal_history[-150:]The terminal history is stored as a list of strings. Every time a new chunk arrives, the list is appended to and then spliced. For large shell outputs (build logs,cating large files), this can lead to significant memory fragmentation and garbage collection overhead in the Hub process. Fix: Use acollections.deque(maxlen=150)for theterminal_historybuffer to ensure O(1) rotation and efficient memory management.
Identified Problems:
threading.Thread(daemon=True) for message delivery if the executor is not ready. In high-load startup scenarios, this could lead to an explosion of unmanaged threads (Thread Leaks).collections.deque for terminal_history to prevent list-splicing performance degradation.LPUSH/LTRIM structure if persistent context across Hub restarts is required.app/config.py to allow tuning for different network environments.This concludes Feature 11. I have persisted this report to /app/docs/reviews/feature_review_node_registry_bus.md. All major backend components have now been audited. Which of these infrastructure improvements should I implement first?