Newer
Older
cortex-hub / ai-hub / app / db / README.md

Here is the properly formatted and organized version of your AI Model Hub Database Schema document:


Database Schema for the AI Model Hub

This document provides a detailed overview of the PostgreSQL database schema for the AI Model Hub service, based on the SQLAlchemy models defined in app/db/model.py. The schema supports core functionalities like:

  • Conversational history
  • Document storage
  • Tracking RAG-related metadata

Schema Diagram

The following diagram illustrates the relationships between the four main tables: sessions, messages, documents, and vector_metadata.

+-------------+         +-------------+
|  sessions   | <------ |  messages   |
+-------------+         +-------------+
| id          | 1       | id          |
| user_id     |         | session_id  |
| title       |         | sender      |
| model_name  |         | content     |
| created_at  |         | created_at  |
| is_archived |         | model_response_time |
+-------------+         | token_count |
                        | metadata    |
                        +-------------+
        ^
        | 1
+------------------+         +-------------+
| vector_metadata  | ---1:1--|  documents  |
+------------------+         +-------------+
| id               |         | id          |
| document_id      |         | title       |
| faiss_index      |         | text        |
| session_id       |         | source_url  |
| embedding_model  |         | author      |
+------------------+         | status      |
                             | created_at  |
                             | user_id     |
                             +-------------+

Table Descriptions

1. sessions Table

Stores metadata for each conversation session. Each row represents a single chat conversation.

  • id (Integer, Primary Key): Unique session identifier
  • user_id (String): ID of the session's owner. Indexed for quick lookups
  • title (String): Human-readable title for the session
  • model_name (String): LLM used (e.g., 'gemini-1.5-pro', 'deepseek-chat')
  • created_at (DateTime): Timestamp of session creation
  • is_archived (Boolean): Soft delete flag

2. messages Table

Stores individual messages within a session.

  • id (Integer, Primary Key): Unique message identifier
  • session_id (Integer, Foreign Key): Links to sessions.id
  • sender (String): Either 'user' or 'assistant'
  • content (Text): Message text
  • created_at (DateTime): Timestamp of message
  • model_response_time (Integer): Time (in seconds) to generate response
  • token_count (Integer): Tokens used for the message
  • metadata (JSON): Flexible field for model/tool-specific data

3. documents Table

Stores original text and metadata of documents ingested into the system.

  • id (Integer, Primary Key): Unique document identifier
  • title (String): Document title
  • text (Text): Full content of the document
  • source_url (String): URL or file path of origin
  • author (String): Author of the document
  • status (String): 'processing', 'ready', or 'failed'
  • created_at (DateTime): Timestamp of upload
  • user_id (String): ID of the uploading user

4. vector_metadata Table

Links documents to their vector representations and session context for RAG.

  • id (Integer, Primary Key): Unique metadata ID
  • document_id (Integer, Foreign Key): Links to documents.id
  • faiss_index (Integer): Index in the FAISS vector store
  • session_id (Integer, Foreign Key): Session where this vector was used
  • embedding_model (String): Embedding model used (e.g., 'text-embedding-004')

Key Relationships

  • One-to-Many: sessions → messages A session contains multiple messages; each message belongs to one session.

  • One-to-One: documents → vector_metadata Each document has a single vector metadata record.

  • Many-to-One: vector_metadata → sessions Multiple vector metadata entries can reference the same session if used for RAG.


Let me know if you’d like this in Markdown, PDF, or any specific format.