Querying & RAG¶
Gaia's core value proposition is Retrieval-Augmented Generation (RAG) — you ask a natural-language question, Gaia retrieves the most relevant chunks from your indexed datasets, and an LLM synthesizes a grounded answer with source citations.
This page explains the pipeline, the API surface, and the patterns you'll use most often.
How the RAG Pipeline Works¶
graph LR
A[User Query] --> B[Embedding]
B --> C[Vector Search]
C --> D[Context Assembly]
D --> E[LLM Generation]
E --> F[Response + Sources] - Embedding — Your query string is converted into a vector embedding using the same model that was used during indexing.
- Vector Search — The embedding is compared against indexed chunks using approximate nearest-neighbor search. The top-k most similar chunks are retrieved.
- Context Assembly — Retrieved chunks are ranked, de-duplicated, and assembled into a prompt context window alongside the original query and any conversation history.
- LLM Generation — The assembled prompt is sent to the configured LLM, which generates a natural-language answer grounded in the retrieved context.
- Response — The answer is returned along with source document metadata, relevance scores, and a unique query ID for feedback/refinement.
The /ask Endpoint¶
POST /ask¶
The primary query endpoint. Send a question and one or more dataset names; receive a synthesized answer with source documents.
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
datasetNames | string[] | Yes | Datasets to search. |
queryString | string | Yes | Natural-language question. |
conversationId | string | No | Continue an existing conversation (see Conversations). |
llmName | string | No | Override the default LLM (e.g., "gpt-4o", "claude-3-sonnet"). |
llmId | string | No | LLM identifier (alternative to llmName). |
history | HistoryEntry[] | No | Explicit conversation history (array of {query, response} pairs). |
Response Body¶
| Field | Type | Description |
|---|---|---|
responseString | string | The LLM-generated answer. |
queryUid | string | Unique ID for this query (used for feedback, refinement). |
conversationId | string | Conversation this query belongs to. |
conversationName | string | Auto-generated or user-set conversation name. |
documents | Document[] | Source documents with snippets and relevance scores. |
finishReason | string | Why generation stopped ("stop", "length", etc.). |
Each Document contains:
| Field | Type | Description |
|---|---|---|
docId | string | Unique document identifier. |
filename | string | Original file name. |
filepath | string | Full path within the data source. |
snippet | string | The most relevant text excerpt. |
score | float | Relevance score (higher is better). |
metadata | object | Additional metadata (size, modified date, etc.). |
Basic Query¶
import asyncio
from gaia_sdk import GaiaClient
async def main():
async with GaiaClient(api_key="YOUR_API_KEY") as gaia:
response = await gaia.ask(
dataset_names=["engineering-docs"],
query="How do we handle database migrations?",
)
print(response.response_string)
if response.documents:
print(f"\nSources ({len(response.documents)}):")
for doc in response.documents:
print(f" • {doc.filename} — score: {doc.score:.3f}")
print(f" {doc.snippet[:120]}…")
asyncio.run(main())
Querying Multiple Datasets¶
Pass multiple dataset names to search across several sources simultaneously. Gaia merges results from all datasets before sending them to the LLM.
response = await gaia.ask(
dataset_names=["engineering-docs", "runbooks", "incident-reports"],
query="What is the RCA process for P1 incidents?",
)
Cross-dataset ranking
Documents from all specified datasets are ranked together by relevance score. There is no per-dataset weighting — the most relevant chunks win regardless of which dataset they came from.
Specifying an LLM¶
By default Gaia uses the cluster's configured LLM. You can override it per-query:
response = await gaia.ask(
dataset_names=["engineering-docs"],
query="Summarize our caching strategy",
llm_name="gpt-4o",
)
Discover available LLMs
Use the SDK to list registered models:
Query with Conversation History¶
For multi-turn conversations, pass the conversationId from a previous response. Gaia automatically includes prior context.
# First question — no conversationId
r1 = await gaia.ask(
dataset_names=["engineering-docs"],
query="What load balancer do we use?",
)
# Follow-up — pass the conversationId to maintain context
r2 = await gaia.ask(
dataset_names=["engineering-docs"],
query="How is health-checking configured for it?",
conversation_id=r1.conversation_id,
)
print(r2.response_string)
You can also supply an explicit history array in the raw API call if you manage conversation state yourself:
{
"datasetNames": ["engineering-docs"],
"queryString": "How is health-checking configured for it?",
"history": [
{
"query": "What load balancer do we use?",
"response": "We use HAProxy for all external traffic …"
}
]
}
See Conversations for a deeper dive into multi-turn patterns.
Understanding Document Scores¶
Every document in the response includes a score field — a floating-point value representing semantic similarity between the query and the document chunk.
| Score Range | Interpretation |
|---|---|
| > 0.85 | Highly relevant — the chunk directly addresses the query. |
| 0.70 – 0.85 | Relevant — useful supporting context. |
| 0.50 – 0.70 | Tangentially related — may or may not be helpful. |
| < 0.50 | Low relevance — unlikely to contain the answer. |
Scores are model-dependent
Exact score ranges vary by embedding model. The table above is a general guideline for the default model. Compare scores relative to each other within a single response rather than relying on absolute thresholds.
Complete Example: Ask with Sources¶
import asyncio
from gaia_sdk import GaiaClient
async def main():
async with GaiaClient(api_key="YOUR_API_KEY") as gaia:
# Query across two datasets
response = await gaia.ask(
dataset_names=["engineering-docs", "runbooks"],
query="What is the procedure for rolling back a failed deployment?",
llm_name="gpt-4o",
)
# Print the answer
print("=" * 60)
print(response.response_string)
print("=" * 60)
# Print source documents
if response.documents:
print(f"\n📎 Sources ({len(response.documents)}):\n")
for i, doc in enumerate(response.documents, 1):
print(f" {i}. {doc.filename}")
print(f" Path : {doc.filepath}")
print(f" Score: {doc.score:.3f}")
print(f" Snippet: {doc.snippet[:200]}…\n")
# Save the queryUid for feedback or refinement
print(f"Query UID: {response.query_uid}")
print(f"Conversation: {response.conversation_id}")
asyncio.run(main())
Tips for Writing Effective Queries¶
Be specific
Instead of "Tell me about deployments", ask "What is the step-by-step process for deploying the payment service to production?". Specific queries retrieve more relevant chunks.
Use domain terminology
Gaia's vector search matches on semantic meaning. Using the same terminology that appears in your documents improves recall — e.g., say "RCA" instead of "root cause analysis" if your runbooks use the abbreviation.
One question at a time
Compound questions ("What is X and how does Y relate to Z?") dilute the embedding signal. Break them into separate queries within the same conversation.
Leverage conversation context
For follow-up questions, always pass conversationId so the LLM can see the prior exchange. This avoids repeating context and produces more coherent answers.
Avoid overly broad queries
Questions like "Tell me everything about our infrastructure" pull in many low-relevance chunks and may exceed the LLM's context window, resulting in truncated or vague answers.
Best Practices for Asking Questions¶
The official Cohesity Gaia guidelines recommend these patterns for getting the best answers:
| Guideline | Description | Example |
|---|---|---|
| Be Specific | Detailed questions retrieve more relevant documents | |
| Provide Context | Include time, location, or situation | "Benefits of exercise for a person in their 50s?" |
| Use Keywords | Include product names, case numbers, specific terms | "Summarize case files for case number 2:11-BK-12345" |
| One Question at a Time | RAG works best with single, focused questions | Split compound questions into separate queries |
| Rephrase for Better Results | Try different wording if unsatisfied | "Benefits of exercise?" → "Positive effects of exercise on the body?" |
| Avoid Ambiguity | Vague questions get vague answers | |
| Use Proper Grammar | Correct spelling and grammar improve retrieval accuracy | AI optimizes for well-formed natural language |
Build These Into Your App
Consider adding placeholder text, example queries, or a "tips" panel in your UI to guide users toward effective question patterns.
Answer Safety¶
Gaia includes built-in content safety safeguards that cannot be disabled. All responses are screened in real-time to filter:
- Sexually explicit content
- Violence or self-harm promotion
- Discriminatory, hateful, or illegal content
Developer Impact
Your application cannot override content safety. If a query or source document triggers safety filters, Gaia will not return harmful content. Design your UX to handle cases where content may be filtered from responses.
Multilingual Support¶
Gaia supports queries and responses in multiple languages:
| Language | Support |
|---|---|
| English | Default |
| Dutch | |
| French | |
| German | |
| Italian | |
| Japanese | |
| Korean | |
| Spanish |
- You can index data in its original language
- Ask questions in any supported language — Gaia responds in the same language
- UI elements, error messages, and system prompts remain in English
# Query in Spanish — Gaia responds in Spanish
response = await gaia.ask(
dataset_names=["documentos-legales"],
query="¿Cuáles son los requisitos de cumplimiento para 2026?",
)
print(response.response_string) # Answer in Spanish
Hallucination Handling¶
Gaia is designed to minimize hallucinations by grounding responses in indexed data:
- When no valid matches are found, Gaia responds: "I could not find enough information to answer the question"
- Users are prompted to either rephrase the question or add more data
- Gaia only generates responses based on actual indexed content, not general knowledge
Handle It In Your App
Check for "could not find" type responses and present users with actionable options: rephrase, select different datasets, or contact an admin to index additional data sources.
What's Next¶
- Conversations — manage multi-turn dialogue, list history, and continue sessions.
- Exhaustive Search — retrieve all matching documents instead of an LLM-generated answer.
- Streaming Responses — stream answers token-by-token for real-time UIs.
- Refine & Feedback — improve answers with targeted document selection.
- Metadata Filtering — combine semantic search with structured metadata filters.