diff --git a/.agent/workflows/troubleshooting.md b/.agent/workflows/troubleshooting.md index f9d1dc5..55d5898 100644 --- a/.agent/workflows/troubleshooting.md +++ b/.agent/workflows/troubleshooting.md @@ -118,3 +118,27 @@ # Tail agent errors locally tail -n 100 ~/.cortex/agent.err.log ``` + +## 5. Mac Mini Specific Agent Debugging +**Symptoms**: +- Terminal hangs immediately after a large file sync (e.g., 10MB test file). +- Commands like `ls` or `cd` work once, then the PTY bridge deadlocks. +- Symptoms unique to macOS: Symlinked paths (e.g., `/tmp` -> `/private/tmp`) causing relative path mismatches. + +**Troubleshooting Steps & Checklist**: +1. **Identify PTY Deadlock**: Run `lsof -p | grep ptmx`. If more than 2-3 PTYs are open during idle, or if the number of PTYs doesn't match the number of active terminal tabs, the reader thread has likely crashed. +2. **Resolve Symlink Mismatches**: macOS handles `/tmp` as a symlink. Always use `os.path.realpath()` in the `Watcher` to ensure that `relpath` calculations are consistent between the root and the modified file. +3. **Check Queue Saturation**: If the orchestrator's `task_queue` (default size 100) is full during a sync, unrelated PTY threads will block waiting for space. Bump `maxsize` to `10000` in `node.py` to handle 10MB+ sync bursts. +4. **Investigate Deadlocks with SIGQUIT**: Send `kill -3 ` to the agent. This will dump all active Python thread stacks to `stdout` (viewable in `agent.out.log`), allowing you to see exactly where the TTY reader is blocked. + +**Diagnostic Commands**: +```bash +# Check File Descriptors for PTY leaks on MacOS +lsof -p $(pgrep -f "agent_node/main.py") | grep ptmx + +# Force a thread dump to logs for deadlock investigation +kill -3 $(pgrep -f "agent_node/main.py") + +# Verify the absolute path resolution that the agent sees +/opt/anaconda3/bin/python3 -c "import os; print(os.path.realpath('/tmp/cortex-sync'))" +```