Newer
Older
cortex-hub / docs / refactors / skill_folder_framework.md

Skill Framework Refactoring Proposal (Folder-based Architecture)

1. Problem Statement: The Monolithic String Anti-Pattern

Currently, the Cortex skills system stores AI capabilities as single database rows (e.g., system_prompt, preview_markdown, config). The UI consists of large text areas where users paste monolithic prompts containing all custom scripts and reference data.

Shortcomings:

  • Context Window Bloat: Putting 1,000+ line scripts directly into system_prompt exhausts the LLM’s context window limit and degrades reasoning capabilities.
  • Static Functionality: Current skills lack the ability to encapsulate executable code securely without cluttering the prompt.
  • Divergence from State-of-the-Art: Modern AI orchestration frameworks define tools/skills as discrete file structures or sandboxed resources, not string values in a relational database.

Industry Validation

Our research confirms that the industry standard has moved away from monolithic prompting:

  1. OpenHands (formerly OpenDevin): Operates using a Runtime Sandbox (Docker container). It grants agents execute and execute_ipython_shell actions. Global skills and repository guidelines are stored as markdown files (like AGENTS.md) and python scripts within the workspace, which are read/executed only when triggered, rather than injected into the system prompt upfront.
  2. OpenAI Assistants API: Utilizes a Sandboxed Code Interpreter. Instead of pasting data and scripts into the system instructions, developers upload files (Python scripts, CSVs) which are mounted to /mnt/data/ within the sandbox. The LLM writes small wrapper scripts to execute or read these files dynamically.
  3. Anthropic Model Context Protocol (MCP): Separates "Resources" (lazily loaded file URIs) from "Tools" (executables). The agent decides when to read a resource URI rather than having the server push the entire file context into the conversation automatically.

2. The Solution: "Skills as Folders" (Lazy Loading Architecture)

The skill definition paradigm must shift from database forms to file trees. A skill should represent a containerized environment of rules, references, and executable assets.

Proposed Structure of a Skill:

A given skill (e.g., mesh-file-explorer) would be managed just like a Git repository folder containing:

/skills/mesh-file-explorer/
├── SKILL.md                 # Core instructions & meta-rationale (What the LLM reads first)
├── scripts/                 # Executable runtimes to lazy load (e.g., node.js CLI tools, Python scrapers)
│   ├── run_explorer.py     
│   └── helper.sh
├── examples/                # Example usages or inputs (few-shot prompting material)
│   └── successful_logs.txt
└── artifacts/               # Binary plugins or reference files

The "Lazy Loading" Advantage

The primary benefit of this folder structure is Lazy Context Injection.

  1. The LLM starts only with the metadata: The agent is given a brief summary of the skill via SKILL.md or a standard system tool describing the folder's purpose.
  2. On-Demand Context: The agent has a subset tool like view_skill_artifact or execute_plugin_script. If the LLM determines it needs to run a web scraper, it calls scripts/run_scraper.py.
  3. Reduction in Tokens: The 1,000+ line Python scraper is never loaded into the conversation prompt. Only its execution results or help output are printed to the agent context.

3. Implementation Roadmap

Phase 1: Storage and Backend Overhaul

  • File System Virtualization: Transition from storing huge SQL Strings (system_prompt) to a virtualized file system mapping. Skills can either be saved to a network drive, synced through the agent-node mesh, or abstracted behind an Object Storage system (S3/minio) or a Virtual File System DB design.
  • REST APIs (Virtual File Explorer): Replace the flat /skills CRUD with a hierarchy:
    • GET /skills/:id/tree (Fetch folder hierarchy)
    • GET /skills/:id/files/:path (Read asset contents)
    • POST /skills/:id/files/:path (Upload/Create code scripts inside a skill)

Phase 2: Frontend "Skill Studio" (IDE-like UI)

The current UI requires replacing the simple forms ("Engineering Mode") with a "Skill Editor Workspace" modeled after basic web-IDEs (like VSCode web or Git repository interfaces).

  • Left Panel: File tree showing SKILL.md, scripts/, artifacts/.
  • Center Canvas: Code editor (e.g. Monaco / CodeMirror) to edit the currently selected file.
  • Asset Uploads: Support for drag-and-dropping Python code, shell scripts, or CSV reference files straight into the skill.

Phase 3: Agentic API (Tool Adaptation)

  • New Standard Tools for the Agent: Inject a system tool to let the agent explore available folders and execute skill artifacts.
  • When an agent equips a "Skill", the system mounts that specific skill's /scripts directory directly into the Agent's sandbox PATH environment variable, making tool invocation native and seamless in bash.

4. Summary of Value

Through this refactoring, skills graduate from "Large Prompts" to "Software Packages". This creates an ecosystem where developers can drop in a complex Docker network or Python repository into a skill folder, and the Cortex LLM can dynamically research and execute those resources as needed without breaking context sizes.