Querying & RAG¶

Gaia's core value proposition is Retrieval-Augmented Generation (RAG) — you ask a natural-language question, Gaia retrieves the most relevant chunks from your indexed datasets, and an LLM synthesizes a grounded answer with source citations.

This page explains the pipeline, the API surface, and the patterns you'll use most often.

How the RAG Pipeline Works¶

graph LR
    A[User Query] --> B[Embedding]
    B --> C[Vector Search]
    C --> D[Context Assembly]
    D --> E[LLM Generation]
    E --> F[Response + Sources]

Embedding — Your query string is converted into a vector embedding using the same model that was used during indexing.
Vector Search — The embedding is compared against indexed chunks using approximate nearest-neighbor search. The top-k most similar chunks are retrieved.
Context Assembly — Retrieved chunks are ranked, de-duplicated, and assembled into a prompt context window alongside the original query and any conversation history.
LLM Generation — The assembled prompt is sent to the configured LLM, which generates a natural-language answer grounded in the retrieved context.
Response — The answer is returned along with source document metadata, relevance scores, and a unique query ID for feedback/refinement.

The `/ask` Endpoint¶

`POST /ask`¶

The primary query endpoint. Send a question and one or more dataset names; receive a synthesized answer with source documents.

Request Body¶

Field	Type	Required	Description
`datasetNames`	`string[]`	Yes	Datasets to search.
`queryString`	`string`	Yes	Natural-language question.
`conversationId`	`string`	No	Continue an existing conversation (see Conversations).
`llmName`	`string`	No	Override the default LLM (e.g., `"gpt-4o"`, `"claude-3-sonnet"`).
`llmId`	`string`	No	LLM identifier (alternative to `llmName`).
`history`	`HistoryEntry[]`	No	Explicit conversation history (array of `{query, response}` pairs).

Response Body¶

Field	Type	Description
`responseString`	`string`	The LLM-generated answer.
`queryUid`	`string`	Unique ID for this query (used for feedback, refinement).
`conversationId`	`string`	Conversation this query belongs to.
`conversationName`	`string`	Auto-generated or user-set conversation name.
`documents`	`Document[]`	Source documents with snippets and relevance scores.
`finishReason`	`string`	Why generation stopped (`"stop"`, `"length"`, etc.).

Each Document contains:

Field	Type	Description
`docId`	`string`	Unique document identifier.
`filename`	`string`	Original file name.
`filepath`	`string`	Full path within the data source.
`snippet`	`string`	The most relevant text excerpt.
`score`	`float`	Relevance score (higher is better).
`metadata`	`object`	Additional metadata (size, modified date, etc.).

Basic Query¶

Python SDKcURL

Python

import asyncio
from gaia_sdk import GaiaClient

async def main():
    async with GaiaClient(api_key="YOUR_API_KEY") as gaia:
        response = await gaia.ask(
            dataset_names=["engineering-docs"],
            query="How do we handle database migrations?",
        )

        print(response.response_string)

        if response.documents:
            print(f"\nSources ({len(response.documents)}):")
            for doc in response.documents:
                print(f"  • {doc.filename} — score: {doc.score:.3f}")
                print(f"    {doc.snippet[:120]}…")

asyncio.run(main())

Bash

curl -s -X POST \
  -H "apiKey: $GAIA_API_KEY" \
  -H "Content-Type: application/json" \
  "https://helios.cohesity.com/v2/mcm/gaia/ask" \
  -d '{
    "datasetNames": ["engineering-docs"],
    "queryString": "How do we handle database migrations?"
  }' | python -m json.tool

Querying Multiple Datasets¶

Pass multiple dataset names to search across several sources simultaneously. Gaia merges results from all datasets before sending them to the LLM.

Python

response = await gaia.ask(
    dataset_names=["engineering-docs", "runbooks", "incident-reports"],
    query="What is the RCA process for P1 incidents?",
)

Cross-dataset ranking

Documents from all specified datasets are ranked together by relevance score. There is no per-dataset weighting — the most relevant chunks win regardless of which dataset they came from.

Specifying an LLM¶

By default Gaia uses the cluster's configured LLM. You can override it per-query:

Python

response = await gaia.ask(
    dataset_names=["engineering-docs"],
    query="Summarize our caching strategy",
    llm_name="gpt-4o",
)

Discover available LLMs

Use the SDK to list registered models:

Python

llms = await gaia.list_llms()
for llm in llms:
    print(f"{llm['name']}  (id: {llm.get('id', 'N/A')})")

Query with Conversation History¶

For multi-turn conversations, pass the conversationId from a previous response. Gaia automatically includes prior context.

Python

# First question — no conversationId
r1 = await gaia.ask(
    dataset_names=["engineering-docs"],
    query="What load balancer do we use?",
)

# Follow-up — pass the conversationId to maintain context
r2 = await gaia.ask(
    dataset_names=["engineering-docs"],
    query="How is health-checking configured for it?",
    conversation_id=r1.conversation_id,
)

print(r2.response_string)

You can also supply an explicit history array in the raw API call if you manage conversation state yourself:

JSON

{
  "datasetNames": ["engineering-docs"],
  "queryString": "How is health-checking configured for it?",
  "history": [
    {
      "query": "What load balancer do we use?",
      "response": "We use HAProxy for all external traffic …"
    }
  ]
}

See Conversations for a deeper dive into multi-turn patterns.

Understanding Document Scores¶

Every document in the response includes a score field — a floating-point value representing semantic similarity between the query and the document chunk.

Score Range	Interpretation
> 0.85	Highly relevant — the chunk directly addresses the query.
0.70 – 0.85	Relevant — useful supporting context.
0.50 – 0.70	Tangentially related — may or may not be helpful.
< 0.50	Low relevance — unlikely to contain the answer.

Scores are model-dependent

Exact score ranges vary by embedding model. The table above is a general guideline for the default model. Compare scores relative to each other within a single response rather than relying on absolute thresholds.

Complete Example: Ask with Sources¶

Python

import asyncio
from gaia_sdk import GaiaClient

async def main():
    async with GaiaClient(api_key="YOUR_API_KEY") as gaia:

        # Query across two datasets
        response = await gaia.ask(
            dataset_names=["engineering-docs", "runbooks"],
            query="What is the procedure for rolling back a failed deployment?",
            llm_name="gpt-4o",
        )

        # Print the answer
        print("=" * 60)
        print(response.response_string)
        print("=" * 60)

        # Print source documents
        if response.documents:
            print(f"\n📎 Sources ({len(response.documents)}):\n")
            for i, doc in enumerate(response.documents, 1):
                print(f"  {i}. {doc.filename}")
                print(f"     Path : {doc.filepath}")
                print(f"     Score: {doc.score:.3f}")
                print(f"     Snippet: {doc.snippet[:200]}…\n")

        # Save the queryUid for feedback or refinement
        print(f"Query UID: {response.query_uid}")
        print(f"Conversation: {response.conversation_id}")

asyncio.run(main())

Tips for Writing Effective Queries¶

Be specific

Instead of "Tell me about deployments", ask "What is the step-by-step process for deploying the payment service to production?". Specific queries retrieve more relevant chunks.

Use domain terminology

Gaia's vector search matches on semantic meaning. Using the same terminology that appears in your documents improves recall — e.g., say "RCA" instead of "root cause analysis" if your runbooks use the abbreviation.

One question at a time

Compound questions ("What is X and how does Y relate to Z?") dilute the embedding signal. Break them into separate queries within the same conversation.

Leverage conversation context

For follow-up questions, always pass conversationId so the LLM can see the prior exchange. This avoids repeating context and produces more coherent answers.

Avoid overly broad queries

Questions like "Tell me everything about our infrastructure" pull in many low-relevance chunks and may exceed the LLM's context window, resulting in truncated or vague answers.

Best Practices for Asking Questions¶

The official Cohesity Gaia guidelines recommend these patterns for getting the best answers:

Guideline	Description	Example
Be Specific	Detailed questions retrieve more relevant documents	"What is the weather?" → "What is the current weather in San Francisco?"
Provide Context	Include time, location, or situation	"Benefits of exercise for a person in their 50s?"
Use Keywords	Include product names, case numbers, specific terms	"Summarize case files for case number 2:11-BK-12345"
One Question at a Time	RAG works best with single, focused questions	Split compound questions into separate queries
Rephrase for Better Results	Try different wording if unsatisfied	"Benefits of exercise?" → "Positive effects of exercise on the body?"
Avoid Ambiguity	Vague questions get vague answers	"How long is it?" → "How long is the Golden Gate Bridge?"
Use Proper Grammar	Correct spelling and grammar improve retrieval accuracy	AI optimizes for well-formed natural language

Build These Into Your App

Consider adding placeholder text, example queries, or a "tips" panel in your UI to guide users toward effective question patterns.

Answer Safety¶

Gaia includes built-in content safety safeguards that cannot be disabled. All responses are screened in real-time to filter:

Sexually explicit content
Violence or self-harm promotion
Discriminatory, hateful, or illegal content

Developer Impact

Your application cannot override content safety. If a query or source document triggers safety filters, Gaia will not return harmful content. Design your UX to handle cases where content may be filtered from responses.

Multilingual Support¶

Gaia supports queries and responses in multiple languages:

Language	Support
English	Default
Dutch
French
German
Italian
Japanese
Korean
Spanish

You can index data in its original language
Ask questions in any supported language — Gaia responds in the same language
UI elements, error messages, and system prompts remain in English

Python

# Query in Spanish — Gaia responds in Spanish
response = await gaia.ask(
    dataset_names=["documentos-legales"],
    query="¿Cuáles son los requisitos de cumplimiento para 2026?",
)
print(response.response_string)  # Answer in Spanish

Hallucination Handling¶

Gaia is designed to minimize hallucinations by grounding responses in indexed data:

When no valid matches are found, Gaia responds: "I could not find enough information to answer the question"
Users are prompted to either rephrase the question or add more data
Gaia only generates responses based on actual indexed content, not general knowledge

Handle It In Your App

Check for "could not find" type responses and present users with actionable options: rephrase, select different datasets, or contact an admin to index additional data sources.

What's Next¶

Conversations — manage multi-turn dialogue, list history, and continue sessions.
Exhaustive Search — retrieve all matching documents instead of an LLM-generated answer.
Streaming Responses — stream answers token-by-token for real-time UIs.
Refine & Feedback — improve answers with targeted document selection.
Metadata Filtering — combine semantic search with structured metadata filters.

Querying & RAG¶

How the RAG Pipeline Works¶

The /ask Endpoint¶

POST /ask¶

Request Body¶

Response Body¶

Basic Query¶

Querying Multiple Datasets¶

Specifying an LLM¶

Query with Conversation History¶

Understanding Document Scores¶

Complete Example: Ask with Sources¶

Tips for Writing Effective Queries¶

Best Practices for Asking Questions¶

Answer Safety¶

Multilingual Support¶

Hallucination Handling¶

What's Next¶

The `/ask` Endpoint¶

`POST /ask`¶