diff --git a/.agent/workflows/deploy_to_production.md b/.agent/workflows/deploy_to_production.md index d6af457..0a0a685 100644 --- a/.agent/workflows/deploy_to_production.md +++ b/.agent/workflows/deploy_to_production.md @@ -37,6 +37,14 @@ All 5 small-file cases must show `PASSED`. If any fail, **do not deploy** — diagnose the regression first (see `/file_sync_tests` for troubleshooting). +**Run macOS File Persistence tests (~30s) — ensures Mac Mini symlink bugs didn't regress:** + +// turbo +```bash +python3 -c "import urllib.request as r; exec(r.urlopen('https://raw.githubusercontent.com/yangyangxie/cortex-hub/HEAD/.agent/workflows/mac_mini_file_sync_persistence.md').read().decode('utf-8').split('```bash')[1].split('```')[0].replace('python3 /tmp/mac_mini_sync_test_verify.py', ''))" +``` +Or view the `.agent/workflows/mac_mini_file_sync_persistence.md` workflow for the verbose script. If the file output is deleted or missing after 20 seconds, **do not deploy**. + --- ## Deployment Steps diff --git a/.agent/workflows/mac_mini_file_sync_persistence.md b/.agent/workflows/mac_mini_file_sync_persistence.md new file mode 100644 index 0000000..05f43dc --- /dev/null +++ b/.agent/workflows/mac_mini_file_sync_persistence.md @@ -0,0 +1,79 @@ +--- +description: How to test that files created on a node persist correctly without deletion loops (specifically addressing the 20-second path loop fix) +--- + +This workflow tests and validates that file synchronization successfully persists newly created files, verifying that they are not erroneously deleted by node-side `watcher.py` loops 20 seconds after creation. + +This test specifically covers the edge cases with macOS symlinks (like `/tmp` mapped to `/private/tmp`) which historically caused the `on_deleted` function in the watcher to output traversing relative paths (e.g., `../../../../tmp/...`) when resolving a deleted `.cortex_tmp` file. + +### When to use this workflow +- After modifying `agent-node/src/agent_node/core/watcher.py` (specifically `on_deleted`). +- After modifying `agent-node/src/agent_node/node.py` (specifically `_handle_fs_write` or manifest push logic). +- To verify the `mac-mini` agent node is behaving correctly. +- If files appear to disappear automatically after a short delay (typically 20 seconds, representing a cleanup cycle). + +### Background Context +Previously, an issue existed where a file written from the Hub to a Node (via `/fs/touch`) would be deleted after ~20 seconds. +1. The Hub requested the file write. +2. The Node saved it with a temporary file (`.cortex_tmp`) and swapped it. +3. The Node OS fired a deletion event for `.cortex_tmp`, throwing it to `watcher.on_deleted`. +4. Because the temporary file was already deleted from disk, `os.path.realpath` could not traverse the symlink `/tmp` to `/private/tmp`, resulting in a miscalculated relative path (`../../../../tmp/...`). +5. This incorrect relative path slipped past the filter and was relayed back to the Hub or triggered `sync_mgr.cleanup` incorrectly. + +The fix ensures `watcher.py` intercepts `__fs_explorer__` root DELETES dynamically. + +### Automated Test Script Validating Persistence + +To verify the files are persisting, run this test against the node. It creates a workspace, touches a file through the API, reads it directly via stdout off the node's container/host, waits 20 seconds, and reads it again to guarantee it hasn't vanished. + +Wait for up to 30-40 seconds to ensure the cleanup loop runs without deleting the file. + +```bash +cat << 'EOF' > /tmp/mac_mini_sync_test_verify.py +import requests +import time +import sys +import subprocess +import os + +BASE_URL = os.getenv("HUB_URL", "http://192.168.68.113:8002") +USER_ID = "9a333ccd-9c3f-432f-a030-7b1e1284a436" +headers = {"X-User-ID": USER_ID} + +# 1. Create a workspace +res = requests.post(f"{BASE_URL}/api/v1/sessions/", headers=headers, json={"type": "standard", "name": "Persistence Test", "user_id": USER_ID}) +session_json = res.json() +session_id = str(session_json["id"]) +sync_workspace_id = str(session_json["sync_workspace_id"]) + +# 2. Attach mac-mini +requests.post(f"{BASE_URL}/api/v1/sessions/{session_id}/nodes/attach", headers=headers, json={"node_ids": ["mac-mini"]}) +time.sleep(2) + +print(f"Session: {session_id}, Sync: {sync_workspace_id}") + +# 3. Create file via Hub +requests.post(f"{BASE_URL}/api/v1/nodes/mac-mini/fs/touch", headers=headers, json={"session_id": sync_workspace_id, "path": "mac_test.txt", "content": "Testing Persistence 123"}) +time.sleep(3) + +# 4. Check file directly on node +result = subprocess.run(f"sshpass -p 'a6163484a' ssh -o StrictHostKeyChecking=no axieyangb@192.168.68.140 'cat /tmp/cortex-sync/{sync_workspace_id}/mac_test.txt'", shell=True, capture_output=True, text=True) +print("Cat Before:", result.stdout.strip() or result.stderr.strip()) + +if "Testing Persistence 123" not in result.stdout: + print("FAILED ALREADY: Initial write did not propagate correctly.") + sys.exit(1) + +# 5. Wait 20 seconds (allows cleanup/manifest drift reconciliations to trigger). +print("Waiting 20 seconds...") +time.sleep(20) + +# 6. Check file directly on node again +result = subprocess.run(f"sshpass -p 'a6163484a' ssh -o StrictHostKeyChecking=no axieyangb@192.168.68.140 'cat /tmp/cortex-sync/{sync_workspace_id}/mac_test.txt'", shell=True, capture_output=True, text=True) +print("Cat After:", result.stdout.strip() or result.stderr.strip()) + +if "Testing Persistence 123" not in result.stdout: + print("FAILED: File was incorrectly deleted or modified after 20 seconds.") + sys.exit(1) + +print("SUCCESS: File persists correctly.")