Kickoff Plan: AI Model Hub Service
This document outlines the plan for developing a central "hub" service that routes requests to various Large Language Models (LLMs) and uses PostgreSQL for metadata storage alongside FAISS for similarity search on vector data.
1. High-Level Architecture
The service will consist of three main components:
API Server: A web server that exposes endpoints to receive user prompts and return model responses. This will be the main entry point for all client applications.
LLM Router/Orchestrator: A core logic layer responsible for deciding which LLM (Gemini, DeepSeek, etc.) should handle a given request. It will also manage interactions with PostgreSQL and FAISS.
Vector Database (FAISS + PostgreSQL): A two-layered database system:
- FAISS: Stores vectors (numerical representations of text). Handles high-performance similarity search.
- PostgreSQL: Stores metadata such as conversation IDs, document titles, timestamps, and other relational data.
2. Technology Stack
API Framework: FastAPI (Python) – High-performance, easy to learn, with automatic interactive documentation, ideal for testing and development.
LLM Interaction: LangChain (or a similar abstraction library) – Simplifies communication with different LLM APIs by providing a unified interface.
Vector Database:
- FAISS: High-performance similarity search for vectors.
- PostgreSQL: Stores metadata for vectors, such as document IDs, user data, timestamps, etc. Used for filtering, organizing, and managing relational data.
Deployment: Docker – Containerizing the application for portability, ensuring easy deployment across any machine within the local network.
3. Development Roadmap
Phase 1: Core API and Model Integration (1-2 weeks)
- [X] Set up a basic FastAPI server.
- [X] Create a
/chat
endpoint that accepts user prompts.
- [X] Implement basic routing logic to forward requests to one hardcoded LLM (e.g., Gemini).
- [X] Connect to the LLM's API and return the response to the user.
Phase 2: PostgreSQL and FAISS Integration (2-3 weeks)
- [ ] Integrate PostgreSQL for metadata storage (document IDs, timestamps, etc.).
- [ ] Integrate FAISS for vector storage and similarity search.
- [ ] On each API call, embed the user prompt and the model's response into vectors.
- [ ] Store the vectors in FAISS and store associated metadata in PostgreSQL (such as document title, conversation ID).
- [ ] Perform a similarity search using FAISS before sending a new prompt to the LLM, and include relevant history stored in PostgreSQL as context.
Phase 3: Multi-Model Routing & RAG (1-2 weeks)
- [ ] Abstract LLM connections to easily support multiple models (Gemini, DeepSeek, etc.).
- [ ] Add logic to the
/chat
endpoint to allow clients to specify which model to use.
- [ ] Create a separate endpoint (e.g.,
/add-document
) to upload text files.
- [ ] Implement a RAG pipeline:
- When a prompt is received, search FAISS for relevant vector matches and retrieve metadata from PostgreSQL.
- Pass the relevant document chunks along with the prompt to the selected LLM.
Phase 4: Refinement and Deployment (1 week)
- [ ] Develop a simple UI (optional, could use FastAPI's built-in docs).
- [ ] Write Dockerfiles for the application.
- [ ] Add configuration management for API keys and other settings.
- [ ] Implement basic logging and error handling.
4. PostgreSQL + FAISS Workflow
Storing Vectors: When a document is added, its vector representation is stored in FAISS. Metadata such as document titles, timestamps, and user IDs are stored in PostgreSQL.
Querying: For a user query, embed the query into a vector. Use FAISS to perform a similarity search and retrieve the nearest vectors. Query PostgreSQL for metadata (e.g., title, author) related to the relevant vectors.
Syncing Data: Ensure that metadata in PostgreSQL is synchronized with vectors in FAISS for accurate and consistent retrieval.