Newer
Older
cortex-hub / .agent / workflows / file_sync_tests.md

description: How to run the file sync integration tests against the production server to verify mesh file propagation is working correctly.

File Sync Integration Tests

Tests live in ai-hub/integration_tests/test_file_sync.py.
They verify real end-to-end file sync across nodes and the Hub mirror.

Prerequisites

  • Production server: 192.168.68.113
  • Credentials: user axieyangb, password a6163484a
  • Nodes test-node-1 and test-node-2 must be online (status: online in agent_nodes table)
  • The test file must be present inside the container at /tmp/test_file_sync.py

Key Environment Variables

Variable Purpose Value
SYNC_TEST_BASE_URL Hub API base URL (inside container) http://127.0.0.1:8000
SYNC_TEST_USER_ID User ID with access to test nodes 9a333ccd-9c3f-432f-a030-7b1e1284a436
SYNC_TEST_NODE1 First test node (optional, defaults to test-node-1) test-node-1
SYNC_TEST_NODE2 Second test node (optional, defaults to test-node-2) test-node-2

Why this user ID? The nodes (test-node-1, test-node-2) are registered under group 4a2e9ec4-4ae9-4ab3-9e78-69c35449ac94. Only users in that group pass the _require_node_access check. 9a333ccd-... is the admin user (axieyangb@gmail.com) who belongs to that group. If you create a new test user you must add them to that group first.


Step 1 — Deploy the test file to the container

Run this whenever the test file changes (it does NOT require a full deploy):

sshpass -p "a6163484a" scp -o StrictHostKeyChecking=no \
  /app/ai-hub/integration_tests/test_file_sync.py \
  axieyangb@192.168.68.113:/tmp/test_file_sync.py \
&& sshpass -p "a6163484a" ssh -o StrictHostKeyChecking=no axieyangb@192.168.68.113 \
  "echo 'a6163484a' | sudo -S docker cp /tmp/test_file_sync.py ai_hub_service:/tmp/test_file_sync.py"

Step 2 — Run all 8 tests (all-in-one)

// turbo

sshpass -p "a6163484a" ssh -o StrictHostKeyChecking=no axieyangb@192.168.68.113 \
  "echo 'a6163484a' | sudo -S docker exec \
    -e SYNC_TEST_BASE_URL=http://127.0.0.1:8000 \
    -e SYNC_TEST_USER_ID=9a333ccd-9c3f-432f-a030-7b1e1284a436 \
    ai_hub_service \
    python3 -m pytest /tmp/test_file_sync.py -v -s -p no:warnings 2>&1"

Expected runtime: ~2 minutes (1 min small files + 1 min large files)


Run only small file tests (faster, ~60s)

// turbo

sshpass -p "a6163484a" ssh -o StrictHostKeyChecking=no axieyangb@192.168.68.113 \
  "echo 'a6163484a' | sudo -S docker exec \
    -e SYNC_TEST_BASE_URL=http://127.0.0.1:8000 \
    -e SYNC_TEST_USER_ID=9a333ccd-9c3f-432f-a030-7b1e1284a436 \
    ai_hub_service \
    python3 -m pytest /tmp/test_file_sync.py::TestSmallFileSync -v -s -p no:warnings 2>&1"

What the tests cover

Case Scenario File Size
1 Write from test-node-1test-node-2 + Hub mirror receive it Small
2 Write from server (via node-1) → all client nodes receive it Small
3 Delete from server → all client nodes purged Small
4 Delete from test-node-2 → Hub mirror + test-node-1 purged Small
5 Write 20 MB from test-node-1test-node-2 + Hub mirror (with SHA-256 verify) 20 MB
6 Write 20 MB from server → all client nodes receive it 20 MB
7 Delete large file from server → all nodes purged 20 MB
8 Delete large file from test-node-2 → Hub mirror + test-node-1 purged 20 MB

Troubleshooting

Tests are SKIPped

The requires_nodes marker skips all tests if the Hub is unreachable. Check:

  1. Is ai_hub_service running? docker ps | grep ai_hub_service
  2. Is the Hub healthy? docker logs --tail 20 ai_hub_service

Create session failed: {"detail":"Not Found"}

The API path is wrong — ensure SESSIONS_PATH = "/api/v1/sessions" in the test file (production uses /api/v1/ prefix set by PATH_PREFIX env var).

Attach nodes failed (403 / 404)

The SYNC_TEST_USER_ID doesn't have access to the nodes. Verify group membership:

sshpass -p "a6163484a" ssh -o StrictHostKeyChecking=no axieyangb@192.168.68.113 \
  "echo 'a6163484a' | sudo -S docker exec ai_hub_service python3 -c \
  'import sqlite3; conn = sqlite3.connect(\"/app/data/ai_hub.db\"); cur = conn.cursor(); \
   cur.execute(\"SELECT id, email, group_id FROM users WHERE id = \\\"YOUR_USER_ID\\\"\"); \
   print(cur.fetchall())'"

Case 3 or 4 (delete propagation) fails

This means the broadcast_delete in assistant.py or the on_deleted filter in watcher.py regressed. Check:

  • watcher.py on_deleted must filter .cortex_tmp / .cortex_lock paths
  • grpc_server.py DELETE handler must guard against temp-file paths
  • assistant.py must have broadcast_delete() method

File appears on node-1 but not node-2

self.memberships in assistant.py doesn't include both nodes for the session. Check broadcast_file_chunk log lines for the destinations list.