This document outlines the plan for developing a central "hub" service that routes requests to various Large Language Models (LLMs) and uses PostgreSQL for metadata storage alongside FAISS for similarity search on vector data.
The service will consist of three main components:
API Server: A web server that exposes endpoints to receive user prompts and return model responses. This will be the main entry point for all client applications.
LLM Router/Orchestrator: A core logic layer responsible for deciding which LLM (Gemini, DeepSeek, etc.) should handle a given request. It will also manage interactions with PostgreSQL and FAISS.
Vector Database (FAISS + PostgreSQL): A two-layered database system:
API Framework: FastAPI (Python) – High-performance, easy to learn, with automatic interactive documentation, ideal for testing and development.
LLM Interaction: LangChain (or a similar abstraction library) – Simplifies communication with different LLM APIs by providing a unified interface.
Vector Database:
Deployment: Docker – Containerizing the application for portability, ensuring easy deployment across any machine within the local network.
/chat
endpoint that accepts user prompts./chat
endpoint to allow clients to specify which model to use./add-document
) to upload text files.Storing Vectors: When a document is added, its vector representation is stored in FAISS. Metadata such as document titles, timestamps, and user IDs are stored in PostgreSQL.
Querying: For a user query, embed the query into a vector. Use FAISS to perform a similarity search and retrieve the nearest vectors. Query PostgreSQL for metadata (e.g., title, author) related to the relevant vectors.
Syncing Data: Ensure that metadata in PostgreSQL is synchronized with vectors in FAISS for accurate and consistent retrieval.
This update to the plan leverages PostgreSQL for metadata management while FAISS handles efficient similarity search. The integration allows you to query and filter both metadata and vectors in an optimized manner, ensuring scalability and flexibility for future features.
Let me know if you'd like more specifics or adjustments!