RAG Test Bench

Ingest documents into our ChromaDB, run questions through your endpoint and our baseline, then inspect retrieval overlap (raw + path-aligned), embedding answer similarity, grounding, and scores.

Ingest & source of truth

Documents → our ChromaDB (directory or zip). SQL → optional test database for replaying endpoint-disclosed queries against row payloads (fields below submit with Run suite).

Or upload a .zip:

SQL database (source of truth)

Test DB URL (SQLAlchemy). When Run suite runs with SQL replay enabled, the bench uses paths below to read SELECT/WITH + row list from each JSON response, re-executes the SQL with :binds, and compares to the payload. Defaults meta.sql / data.rows.

Enable SQL replay vs endpoint rows

sql_database_url (optional if SQL_DATABASE_URL in .env)

SQL path (dotted)

Rows path (dotted)

sql_truth_replacements JSON (:bind params)

Detect response preset

Send a probe to your URL or paste one sample JSON response. The bench shows the sample, a heuristic suggestion, and an extraction preview. Pick a preset and apply it to the Run suite form below (or override manually).

Run suite

Endpoint: POST { "query", "top_k" }. Response mapping uses response preset below (default rag_legacy: answer/response + top_k_docs/sources). Embedding model for answer similarity: sentence-transformers / all-MiniLM-L6-v2

App test URL

Mode

Num questions / turns

Top K

Response preset

Generic: dotted paths (optional)

Sources path (optional)

Generic: field aliases JSON (optional)

Use GPT for question generation LLM-as-judge (GEval-style; extra LLM API cost)