Skip to content

Querying & RAG

Gaia's core value proposition is Retrieval-Augmented Generation (RAG) — you ask a natural-language question, Gaia retrieves the most relevant chunks from your indexed datasets, and an LLM synthesizes a grounded answer with source citations.

This page explains the pipeline, the API surface, and the patterns you'll use most often.


How the RAG Pipeline Works

graph LR
    A[User Query] --> B[Embedding]
    B --> C[Vector Search]
    C --> D[Context Assembly]
    D --> E[LLM Generation]
    E --> F[Response + Sources]
  1. Embedding — Your query string is converted into a vector embedding using the same model that was used during indexing.
  2. Vector Search — The embedding is compared against indexed chunks using approximate nearest-neighbor search. The top-k most similar chunks are retrieved.
  3. Context Assembly — Retrieved chunks are ranked, de-duplicated, and assembled into a prompt context window alongside the original query and any conversation history.
  4. LLM Generation — The assembled prompt is sent to the configured LLM, which generates a natural-language answer grounded in the retrieved context.
  5. Response — The answer is returned along with source document metadata, relevance scores, and a unique query ID for feedback/refinement.

The /ask Endpoint

POST /ask

The primary query endpoint. Send a question and one or more dataset names; receive a synthesized answer with source documents.

Request Body

Field Type Required Description
datasetNames string[] Yes Datasets to search.
queryString string Yes Natural-language question.
conversationId string No Continue an existing conversation (see Conversations).
llmName string No Override the default LLM (e.g., "gpt-4o", "claude-3-sonnet").
llmId string No LLM identifier (alternative to llmName).
history HistoryEntry[] No Explicit conversation history (array of {query, response} pairs).

Response Body

Field Type Description
responseString string The LLM-generated answer.
queryUid string Unique ID for this query (used for feedback, refinement).
conversationId string Conversation this query belongs to.
conversationName string Auto-generated or user-set conversation name.
documents Document[] Source documents with snippets and relevance scores.
finishReason string Why generation stopped ("stop", "length", etc.).

Each Document contains:

Field Type Description
docId string Unique document identifier.
filename string Original file name.
filepath string Full path within the data source.
snippet string The most relevant text excerpt.
score float Relevance score (higher is better).
metadata object Additional metadata (size, modified date, etc.).

Basic Query

Python
import asyncio
from gaia_sdk import GaiaClient

async def main():
    async with GaiaClient(api_key="YOUR_API_KEY") as gaia:
        response = await gaia.ask(
            dataset_names=["engineering-docs"],
            query="How do we handle database migrations?",
        )

        print(response.response_string)

        if response.documents:
            print(f"\nSources ({len(response.documents)}):")
            for doc in response.documents:
                print(f"  • {doc.filename} — score: {doc.score:.3f}")
                print(f"    {doc.snippet[:120]}…")

asyncio.run(main())
Bash
curl -s -X POST \
  -H "apiKey: $GAIA_API_KEY" \
  -H "Content-Type: application/json" \
  "https://helios.cohesity.com/v2/mcm/gaia/ask" \
  -d '{
    "datasetNames": ["engineering-docs"],
    "queryString": "How do we handle database migrations?"
  }' | python -m json.tool

Querying Multiple Datasets

Pass multiple dataset names to search across several sources simultaneously. Gaia merges results from all datasets before sending them to the LLM.

Python
response = await gaia.ask(
    dataset_names=["engineering-docs", "runbooks", "incident-reports"],
    query="What is the RCA process for P1 incidents?",
)

Cross-dataset ranking

Documents from all specified datasets are ranked together by relevance score. There is no per-dataset weighting — the most relevant chunks win regardless of which dataset they came from.


Specifying an LLM

By default Gaia uses the cluster's configured LLM. You can override it per-query:

Python
response = await gaia.ask(
    dataset_names=["engineering-docs"],
    query="Summarize our caching strategy",
    llm_name="gpt-4o",
)

Discover available LLMs

Use the SDK to list registered models:

Python
llms = await gaia.list_llms()
for llm in llms:
    print(f"{llm['name']}  (id: {llm.get('id', 'N/A')})")

Query with Conversation History

For multi-turn conversations, pass the conversationId from a previous response. Gaia automatically includes prior context.

Python
# First question — no conversationId
r1 = await gaia.ask(
    dataset_names=["engineering-docs"],
    query="What load balancer do we use?",
)

# Follow-up — pass the conversationId to maintain context
r2 = await gaia.ask(
    dataset_names=["engineering-docs"],
    query="How is health-checking configured for it?",
    conversation_id=r1.conversation_id,
)

print(r2.response_string)

You can also supply an explicit history array in the raw API call if you manage conversation state yourself:

JSON
{
  "datasetNames": ["engineering-docs"],
  "queryString": "How is health-checking configured for it?",
  "history": [
    {
      "query": "What load balancer do we use?",
      "response": "We use HAProxy for all external traffic …"
    }
  ]
}

See Conversations for a deeper dive into multi-turn patterns.


Understanding Document Scores

Every document in the response includes a score field — a floating-point value representing semantic similarity between the query and the document chunk.

Score Range Interpretation
> 0.85 Highly relevant — the chunk directly addresses the query.
0.70 – 0.85 Relevant — useful supporting context.
0.50 – 0.70 Tangentially related — may or may not be helpful.
< 0.50 Low relevance — unlikely to contain the answer.

Scores are model-dependent

Exact score ranges vary by embedding model. The table above is a general guideline for the default model. Compare scores relative to each other within a single response rather than relying on absolute thresholds.


Complete Example: Ask with Sources

Python
import asyncio
from gaia_sdk import GaiaClient

async def main():
    async with GaiaClient(api_key="YOUR_API_KEY") as gaia:

        # Query across two datasets
        response = await gaia.ask(
            dataset_names=["engineering-docs", "runbooks"],
            query="What is the procedure for rolling back a failed deployment?",
            llm_name="gpt-4o",
        )

        # Print the answer
        print("=" * 60)
        print(response.response_string)
        print("=" * 60)

        # Print source documents
        if response.documents:
            print(f"\n📎 Sources ({len(response.documents)}):\n")
            for i, doc in enumerate(response.documents, 1):
                print(f"  {i}. {doc.filename}")
                print(f"     Path : {doc.filepath}")
                print(f"     Score: {doc.score:.3f}")
                print(f"     Snippet: {doc.snippet[:200]}\n")

        # Save the queryUid for feedback or refinement
        print(f"Query UID: {response.query_uid}")
        print(f"Conversation: {response.conversation_id}")

asyncio.run(main())

Tips for Writing Effective Queries

Be specific

Instead of "Tell me about deployments", ask "What is the step-by-step process for deploying the payment service to production?". Specific queries retrieve more relevant chunks.

Use domain terminology

Gaia's vector search matches on semantic meaning. Using the same terminology that appears in your documents improves recall — e.g., say "RCA" instead of "root cause analysis" if your runbooks use the abbreviation.

One question at a time

Compound questions ("What is X and how does Y relate to Z?") dilute the embedding signal. Break them into separate queries within the same conversation.

Leverage conversation context

For follow-up questions, always pass conversationId so the LLM can see the prior exchange. This avoids repeating context and produces more coherent answers.

Avoid overly broad queries

Questions like "Tell me everything about our infrastructure" pull in many low-relevance chunks and may exceed the LLM's context window, resulting in truncated or vague answers.


Best Practices for Asking Questions

The official Cohesity Gaia guidelines recommend these patterns for getting the best answers:

Guideline Description Example
Be Specific Detailed questions retrieve more relevant documents ❌ "What is the weather?" → ✅ "What is the current weather in San Francisco?"
Provide Context Include time, location, or situation "Benefits of exercise for a person in their 50s?"
Use Keywords Include product names, case numbers, specific terms "Summarize case files for case number 2:11-BK-12345"
One Question at a Time RAG works best with single, focused questions Split compound questions into separate queries
Rephrase for Better Results Try different wording if unsatisfied "Benefits of exercise?" → "Positive effects of exercise on the body?"
Avoid Ambiguity Vague questions get vague answers ❌ "How long is it?" → ✅ "How long is the Golden Gate Bridge?"
Use Proper Grammar Correct spelling and grammar improve retrieval accuracy AI optimizes for well-formed natural language

Build These Into Your App

Consider adding placeholder text, example queries, or a "tips" panel in your UI to guide users toward effective question patterns.


Answer Safety

Gaia includes built-in content safety safeguards that cannot be disabled. All responses are screened in real-time to filter:

  • Sexually explicit content
  • Violence or self-harm promotion
  • Discriminatory, hateful, or illegal content

Developer Impact

Your application cannot override content safety. If a query or source document triggers safety filters, Gaia will not return harmful content. Design your UX to handle cases where content may be filtered from responses.


Multilingual Support

Gaia supports queries and responses in multiple languages:

Language Support
English Default
Dutch
French
German
Italian
Japanese
Korean
Spanish
  • You can index data in its original language
  • Ask questions in any supported language — Gaia responds in the same language
  • UI elements, error messages, and system prompts remain in English
Python
# Query in Spanish — Gaia responds in Spanish
response = await gaia.ask(
    dataset_names=["documentos-legales"],
    query="¿Cuáles son los requisitos de cumplimiento para 2026?",
)
print(response.response_string)  # Answer in Spanish

Hallucination Handling

Gaia is designed to minimize hallucinations by grounding responses in indexed data:

  • When no valid matches are found, Gaia responds: "I could not find enough information to answer the question"
  • Users are prompted to either rephrase the question or add more data
  • Gaia only generates responses based on actual indexed content, not general knowledge

Handle It In Your App

Check for "could not find" type responses and present users with actionable options: rephrase, select different datasets, or contact an admin to index additional data sources.


What's Next