Backend Modularity & Extensibility Refactor Plan

🎯 Objective

Refactor the Cortex Hub backend to improve maintainability, scalability, and developer experience. The current implementation suffers from bloated routing files ("Fat Routers") and mixed concerns between the API, business logic, and infrastructure layers.

🔍 Current State Analysis

🚩 Critical Hotspots (Bloated Files)

File	Lines	Key Issues
`app/api/routes/nodes.py`	~1,335	Mixes CRUD, gRPC dispatch logic, WebSocket streaming, and Provisioning script generation.
`app/api/routes/user.py`	~1,114	Contains OIDC flows, complex preference masking/inheritance, and provider health verification.
`app/api/routes/sessions.py`	~573	Carries session state management that should be in a service.
`app/core/services/tool.py`	~477	Monolithic tool implementation; difficult to add new tools without touching core files.
`app/db/models.py`	18KB	All database entities in a single file; slows down development and increases merge conflicts.

🛠 Architecture Violations

Concerns Leakage: Database queries and complex business logic live directly within FastAPI route handlers.
Scalability Barriers: Node registry and WebSocket state are kept in-memory, preventing simple horizontal scaling without Redis/Distributed state.
Hard-to-Test: Large functions with many dependencies make unit testing cumbersome.

🏗 Implemented Target Architecture

We have moved towards a Clean Architecture / Domain-Driven approach while maintaining 12-Factor App principles.

1. Database & Models Split

Move from db/models.py to a module-based structure:

app/db/models/
- __init__.py (Exports all models)
- user.py
- node.py
- session.py
- audit.py

2. Service Layer Extraction (Domain Logic)

Extract logic from routers into dedicated, testable services:

AuthService: OIDC logic, token validation, user onboarding.
MeshService: Node registration, health tracking, gRPC dispatching logic.
PreferenceService: Complex LLM/TTS/STT preference resolution and masking.
SessionService: Lifecycle management of chat sessions.

3. Slim Routers

Routers should only:

Define the endpoint and tags.
Handle input validation (Pydantic).
Call the appropriate Service.
Return the response.

4. Template & Utility Decoupling

Move large string constants (Provisioning scripts, READMEs) to:

app/core/templates/provisioning/
- bootstrap.py.j2
- run.sh.j2

5. Plugin-based Tool System

Refactor tool.py to use a dynamic registry:

app/core/tools/
- base.py (Interface defining a tool)
- registry.py (Auto-loader for tools)
- definitions/ (Individual tool files like file_system.py, browser.py, etc.)

📅 Execution Phases

Phase 1: Physical Decomposition (Infrastructure)

Split app/db/models.py into app/db/models/*.py.
Split large schemas.py if necessary into domain-specific schemas.
Move script constants from nodes.py to a templates directory.

Phase 2: Domain Extraction (The "Slimming")

Nodes Refactor: Extract MeshService. Move _require_node_access and _node_to_user_view into it.
User Refactor: Extract AuthService and PreferenceService. Move OIDC callback logic and preference masking to services.
Session Refactor: Extract SessionService.

Phase 3: Advanced Decoupling (Extensibility)

Implement the Plugin-based Tool System.
Standardize error handling and response wrapping.
Ensure all configurations strictly follow the 12-factor ENV pattern (no hardcoded defaults in code where possible).

✅ Success Criteria

No routing file exceeds 400 lines.
Business logic is 100% extracted from app/api/routes.
New tools/skills can be added by dropping a file into a folder.
All database models are modularized.
Improved unit test coverage due to decoupled service logic.