diff --git a/docs/features/harness_engineering/harness_engineering_execution_plan.md b/docs/features/harness_engineering/harness_engineering_execution_plan.md index a5e31a7..832e350 100644 --- a/docs/features/harness_engineering/harness_engineering_execution_plan.md +++ b/docs/features/harness_engineering/harness_engineering_execution_plan.md @@ -26,8 +26,8 @@ ### Task 0.4: Context Truncation (Head & Tail Preservation) If `history=session.messages` is passed natively to the AI orchestrator, an autonomous loop will rapidly exceed the 128k token API limit and crash with an `HTTP 400 ContextWindowExceededError`. However, blindly slicing the last 20 messages destroys the Agent's foundational mission prompt at the start of the chat. -- **Action:** Refactor `chat_with_rag` to aggressively chunk the message array. Provide the AI with the **Head** (The initial 3 messages containing its core directive) and the **Tail** (The most recent 10-15 messages containing its immediate working memory/errors). -- **Action:** Before hitting the API, completely remove the vast middle section of the array to guarantee the Agent never exceeds the API limit, while still retaining absolute knowledge of *why* it is working. +- **Action:** Refactor `chat_with_rag` to aggressively chunk the message array. Provide the central AI with the **Head** (The initial 3 messages containing its core directive) and the **Tail** (The most recent 10-15 messages containing its immediate working memory/errors). +- **Action (The Summarizer):** Instead of blindly deleting the middle section and causing amnesia, dispatch the "middle" messages to a fast, cheap sub-agent (e.g., `gpt-4o-mini` or `claude-haiku`) with a strict prompt to compress it into a dense *"Timeline of Past Actions"* paragraph. Replace the middle array with this single string. This guarantees the API token limit is never breached, while providing seamless chronological tracking to the main Agent. ---