Newer
Older
cortex-hub / KickoffPlan.md

Kickoff Plan: AI Model Hub Service

This document outlines the plan for developing a central "hub" service that routes requests to various Large Language Models (LLMs) and uses PostgreSQL for metadata storage alongside FAISS for similarity search on vector data.


1. High-Level Architecture

The service will consist of three main components:

  1. API Server: A web server that exposes endpoints to receive user prompts and return model responses. This will be the main entry point for all client applications.

  2. LLM Router/Orchestrator: A core logic layer responsible for deciding which LLM (Gemini, DeepSeek, etc.) should handle a given request. It will also manage interactions with PostgreSQL and FAISS.

  3. Vector Database (FAISS + PostgreSQL): A two-layered database system:

    • FAISS: Stores vectors (numerical representations of text). Handles high-performance similarity search.
    • PostgreSQL: Stores metadata such as conversation IDs, document titles, timestamps, and other relational data.

2. Technology Stack

  • API Framework: FastAPI (Python) – High-performance, easy to learn, with automatic interactive documentation, ideal for testing and development.

  • LLM Interaction: LangChain (or a similar abstraction library) – Simplifies communication with different LLM APIs by providing a unified interface.

  • Vector Database:

    • FAISS: High-performance similarity search for vectors.
    • PostgreSQL: Stores metadata for vectors, such as document IDs, user data, timestamps, etc. Used for filtering, organizing, and managing relational data.
  • Deployment: Docker – Containerizing the application for portability, ensuring easy deployment across any machine within the local network.


3. Development Roadmap

Phase 1: Core API and Model Integration (1-2 weeks)

  • [X] Set up a basic FastAPI server.
  • [X] Create a /chat endpoint that accepts user prompts.
  • [X] Implement basic routing logic to forward requests to one hardcoded LLM (e.g., Gemini).
  • [X] Connect to the LLM's API and return the response to the user.

Phase 2: PostgreSQL and FAISS Integration (2-3 weeks)

  • [ ] Integrate PostgreSQL for metadata storage (document IDs, timestamps, etc.).
  • [ ] Integrate FAISS for vector storage and similarity search.
  • [ ] On each API call, embed the user prompt and the model's response into vectors.
  • [ ] Store the vectors in FAISS and store associated metadata in PostgreSQL (such as document title, conversation ID).
  • [ ] Perform a similarity search using FAISS before sending a new prompt to the LLM, and include relevant history stored in PostgreSQL as context.

Phase 3: Multi-Model Routing & RAG (1-2 weeks)

  • [ ] Abstract LLM connections to easily support multiple models (Gemini, DeepSeek, etc.).
  • [ ] Add logic to the /chat endpoint to allow clients to specify which model to use.
  • [ ] Create a separate endpoint (e.g., /add-document) to upload text files.
  • [ ] Implement a RAG pipeline:
    • When a prompt is received, search FAISS for relevant vector matches and retrieve metadata from PostgreSQL.
    • Pass the relevant document chunks along with the prompt to the selected LLM.

Phase 4: Refinement and Deployment (1 week)

  • [ ] Develop a simple UI (optional, could use FastAPI's built-in docs).
  • [ ] Write Dockerfiles for the application.
  • [ ] Add configuration management for API keys and other settings.
  • [ ] Implement basic logging and error handling.

4. PostgreSQL + FAISS Workflow

  • Storing Vectors: When a document is added, its vector representation is stored in FAISS. Metadata such as document titles, timestamps, and user IDs are stored in PostgreSQL.

  • Querying: For a user query, embed the query into a vector. Use FAISS to perform a similarity search and retrieve the nearest vectors. Query PostgreSQL for metadata (e.g., title, author) related to the relevant vectors.

  • Syncing Data: Ensure that metadata in PostgreSQL is synchronized with vectors in FAISS for accurate and consistent retrieval.