diff --git a/.agent/workflows/troubleshooting.md b/.agent/workflows/troubleshooting.md index 2dd9bdc..f9d1dc5 100644 --- a/.agent/workflows/troubleshooting.md +++ b/.agent/workflows/troubleshooting.md @@ -90,3 +90,31 @@ ```bash echo 'your-password' | sudo -S ls -la /var/lib/docker/volumes/cortex-hub_ai_hub_data/_data/mirrors/session-YOUR_SESSION ``` + +## 4. Unresponsive Terminal (Zombie PTY Processes on Local Nodes) +**Symptoms**: +- Agent node shows tasks being received in its logs. +- Executing bash commands via the AI mesh terminal hangs indefinitely or outputs nothing. +- The `ThreadPoolExecutor` or PTY bridge is gridlocked because of a previous broken shell session. + +**Troubleshooting Steps & Checklist**: +1. Check the active Python process list to locate the running `agent-node/main.py`. +2. Inspect the agent stdout and stderr logs locally (e.g. `~/.cortex/agent.out.log` and `~/.cortex/agent.err.log`) for any errors or blocked threads. +3. If the process is hung on a zombie child shell, force-kill (`kill -9`) the main agent process AND all its recursive child processes (to properly clean up spawned PTY shells). +4. Restart the agent using `bash run.sh` and redirect logs appropriately. +*Note: The agent node logic (`agent-node/src/agent_node/main.py`) has been updated in `v1.0.62` to automatically harvest and kill child PTY processes when clearing orphaned instances on boot, preventing this state across restarts.* + +### Useful Diagnostic Commands (Local Node) +```bash +# Locate the agent process ID +pgrep -f "agent_node/main.py" + +# Kill main process and all child PTYs (Force restart) +mac_pid=$(pgrep -f "agent_node/main.py") && kill -9 $mac_pid + +# Restart agent gracefully with output redirected to background logs +cd ~/.cortex/agent-node && bash run.sh > ~/.cortex/agent.out.log 2> ~/.cortex/agent.err.log & + +# Tail agent errors locally +tail -n 100 ~/.cortex/agent.err.log +```