| 2026-04-30 |

Self-recovery: dispatch dict, disable updater in Docker, fix watchdog semantics
...
1. Replace if/elif dispatch chain with _DISPATCH dict (engines/node.py)
Adding a proto message kind now requires exactly one entry in _DISPATCH —
impossible to add a callback slot without the routing case. Missing kinds
emit WARNING immediately. Extractor is colocated with the route so the
wrong-object bug (Bug 1) cannot recur.
2. Honour CORTEX_DISABLE_AUTO_UPDATE env var (updater.py)
In Docker, updates are delivered via image rebuilds. The old behaviour
called sys.exit(0) to spawn a bootstrapper, which Docker restart:always
turned into an infinite boot loop. Setting CORTEX_DISABLE_AUTO_UPDATE=1
in the container env prevents this entirely.
3. Watchdog ticks unconditionally in health reporter (node.py)
Previously the watchdog tick was skipped when the hub was unreachable,
causing os._exit(1) after 300s of any disconnect — even during normal
gRPC reconnect backoff. Now the watchdog proves the reporter thread is
alive regardless of hub state; it only fires if the thread itself deadlocks.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Harden mesh dispatch against silent failures
...
Three structural improvements to prevent future silent drops:
1. Add else-warning in on_message() dispatch chain — any unhandled proto
message kind now logs a WARNING in production instead of being silently
swallowed. This would have made the work_pool_update miss immediately visible.
2. Fix on_ready to use getattr(message, kind) — same safe extraction pattern
now applied consistently to all dispatch branches. Prevents Bug 1 class
from recurring if _on_mesh_ready ever starts using the argument.
3. Remove redundant _health_thread_started boolean from send_health() guard —
_start_health_stream() already gates on thread.is_alive(), so the outer
boolean was a stale fast-path that could mislead future readers back into
the original thread-restart bug.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Fix node disconnect: health stream restart, policy dispatch, work_pool routing
...
Three bugs causing nodes to drop offline and stay offline after disconnect:
1. on_policy passed full ServerMessage instead of sub-message — caused
Dispatch Error: mode on every connection (AgentNode._on_policy_update
accesses policy.mode directly)
2. _health_thread_started never reset in close() — health stream could not
restart after reconnect, so hub watchdog eventually timed out the node
3. work_pool_update had no dispatch case in on_message() — _handle_work_pool
was dead code, causing hub to flood nodes with unclaimed task updates
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
| 2026-04-28 |
Fix Windows agent offline issues: Added run_loop.bat wrapper and fixed Sandbox policy handshake
|
| 2026-04-25 |
Fix integration tests, deadlocks, and race conditions in coworker flow
Antigravity AI
committed
17 days ago
|
refactor done
yangyang xie
committed
17 days ago
|