Your First API Call¶

Let's make your first call to the Gaia API. By the end of this page you'll have listed your available datasets and asked Gaia a question — all from a simple Python script.

Step 1: Set Up Your Environment¶

Create a .env file in your project root with your Gaia credentials:

Bash

# .env
GAIA_API_KEY=your-api-key-here
GAIA_BASE_URL=https://helios.cohesity.com/v2/mcm/gaia

Replace your-api-key-here

Use the actual API key you generated in Authentication. The placeholder value will not work.

Step 2: List Your Datasets¶

Datasets are the foundation of everything in Gaia — they represent the indexed data you can query. Let's start by listing what's available.

Endpoint: GET /datasets

cURLPython (httpx)Python (Gaia SDK)

Load your .env into the shell first so $GAIA_API_KEY is available:

Bash

export $(grep -v '^#' .env | xargs)

Then list datasets:

Bash

curl -s \
  -H "apiKey: $GAIA_API_KEY" \
  https://helios.cohesity.com/v2/mcm/gaia/datasets | python3 -m json.tool

Python

import asyncio
import os

import httpx
from dotenv import load_dotenv

load_dotenv()

async def main():
    async with httpx.AsyncClient(
        base_url=os.environ["GAIA_BASE_URL"],
        headers={"apiKey": os.environ["GAIA_API_KEY"]},
    ) as client:
        response = await client.get("/datasets")
        response.raise_for_status()
        data = response.json()

        # The /datasets endpoint returns {"datasets": [...]}
        for ds in data.get("datasets", []):
            print(f"  {ds['name']}  (status: {ds.get('status', 'unknown')})")

asyncio.run(main())

Python

import asyncio
from gaia_sdk import GaiaClient

async def main():
    async with GaiaClient.from_env() as gaia:
        datasets = await gaia.list_datasets()
        for ds in datasets:
            print(f"  {ds.name}  (status: {ds.status})")

asyncio.run(main())

Expected output:

Text Only

  financial-reports  (status: ready)
  it-runbooks        (status: ready)
  hr-policies        (status: indexing)

No datasets?

If you see an empty list, it means no datasets have been created and indexed on your Gaia instance yet. You can create one through the Cohesity UI or via the API — see Understanding Datasets.

Step 3: Ask a Question¶

Now let's ask Gaia a question against one of your datasets. This uses the RAG query endpoint.

Endpoint: POST /ask

Request body:

JSON

{
  "datasetNames": ["financial-reports"],
  "queryString": "What were the total expenses in Q4?"
}

cURLPython (httpx)Python (Gaia SDK)

If you haven't already loaded your .env file into the shell, run: export $(grep -v '^#' .env | xargs)

Bash

curl -s -X POST \
  -H "apiKey: $GAIA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "datasetNames": ["financial-reports"],
    "queryString": "What were the total expenses in Q4?"
  }' \
  https://helios.cohesity.com/v2/mcm/gaia/ask | python3 -m json.tool

Python

import asyncio
import os

import httpx
from dotenv import load_dotenv

load_dotenv()

async def main():
    async with httpx.AsyncClient(
        base_url=os.environ["GAIA_BASE_URL"],
        headers={
            "apiKey": os.environ["GAIA_API_KEY"],
            "Content-Type": "application/json",
        },
    ) as client:
        response = await client.post("/ask", json={
            "datasetNames": ["financial-reports"],
            "queryString": "What were the total expenses in Q4?",
        })
        response.raise_for_status()
        data = response.json()

        print("Answer:", data.get("responseString", ""))
        print("Query UID:", data.get("queryUid", ""))
        print("Conversation:", data.get("conversationId", ""))
        print(f"Sources: {len(data.get('documents', []))} documents")

asyncio.run(main())

Python

import asyncio
from gaia_sdk import GaiaClient

async def main():
    async with GaiaClient.from_env() as gaia:
        response = await gaia.ask(
            dataset_names=["financial-reports"],
            query="What were the total expenses in Q4?",
        )

        print("Answer:", response.response_string)
        print("Query UID:", response.query_uid)
        print("Conversation:", response.conversation_id)
        if response.documents:
            print(f"Sources: {len(response.documents)} documents")
            for doc in response.documents[:3]:
                print(f"  - {doc.filename} (score: {doc.score})")

asyncio.run(main())

Step 4: Understand the Response¶

The /ask endpoint returns a JSON object with the following key fields:

JSON

{
  "responseString": "Based on the financial reports, total Q4 expenses were $2.4M...",
  "queryUid": "q-abc123-def456",
  "conversationId": "conv-789xyz",
  "conversationName": "Q4 Expenses",
  "documents": [
    {
      "docId": "doc-001",
      "filename": "Q4-2025-expenses.xlsx",
      "filepath": "/finance/reports/Q4-2025-expenses.xlsx",
      "snippet": "...total operating expenses of $2,400,000...",
      "score": 0.94
    },
    {
      "docId": "doc-002",
      "filename": "annual-summary-2025.pdf",
      "filepath": "/finance/reports/annual-summary-2025.pdf",
      "snippet": "...fourth quarter expenses increased by 12%...",
      "score": 0.87
    }
  ],
  "finishReason": "stop"
}

Field	Description
`responseString`	The LLM-generated answer, grounded in the retrieved documents
`queryUid`	Unique identifier for this query — used for feedback and refine operations
`conversationId`	Conversation identifier — pass this in subsequent queries to maintain context
`conversationName`	Auto-generated name for the conversation
`documents`	Source documents the answer was derived from, with relevance scores
`finishReason`	Why the LLM stopped generating (`stop` = normal completion)

Documents are your citations

The documents array contains the evidence that the LLM used to generate its answer. This is what makes RAG trustworthy — you can verify the answer against the source material.

Complete Runnable Script¶

Here's a self-contained script that brings it all together:

Python

#!/usr/bin/env python3
"""Your first Gaia API call — list datasets and ask a question."""

import asyncio
import os
import sys

from dotenv import load_dotenv

load_dotenv()


async def main():
    # Check for the SDK, fall back to raw httpx
    try:
        from gaia_sdk import GaiaClient
        use_sdk = True
    except ImportError:
        import httpx
        use_sdk = False
        print("gaia_sdk not found — using httpx directly\n")

    if use_sdk:
        await run_with_sdk()
    else:
        await run_with_httpx()


async def run_with_sdk():
    from gaia_sdk import GaiaClient

    async with GaiaClient.from_env() as gaia:
        # List datasets
        print("=== Available Datasets ===\n")
        datasets = await gaia.list_datasets()

        if not datasets:
            print("No datasets found. Create one in the Cohesity UI first.")
            sys.exit(1)

        for ds in datasets:
            print(f"  • {ds.name}  (status: {ds.status})")

        # Use the first ready dataset
        ready = [ds for ds in datasets if ds.status == "ready"]
        if not ready:
            print("\nNo datasets are ready for querying yet.")
            sys.exit(1)

        target = ready[0].name
        query = "Summarize the key information in this dataset."

        print(f"\n=== Asking: '{query}' ===")
        print(f"    Dataset: {target}\n")

        response = await gaia.ask(
            dataset_names=[target],
            query=query,
        )

        print(f"Answer:\n{response.response_string}\n")
        print(f"Query UID:       {response.query_uid}")
        print(f"Conversation ID: {response.conversation_id}")

        if response.documents:
            print(f"\nSources ({len(response.documents)} documents):")
            for doc in response.documents[:5]:
                score = f"{doc.score:.2f}" if doc.score else "N/A"
                print(f"  [{score}] {doc.filename or doc.filepath}")


async def run_with_httpx():
    import httpx

    base_url = os.environ.get(
        "GAIA_BASE_URL", "https://helios.cohesity.com/v2/mcm/gaia"
    )
    api_key = os.environ.get("GAIA_API_KEY")

    if not api_key:
        print("Error: GAIA_API_KEY environment variable is not set.")
        sys.exit(1)

    async with httpx.AsyncClient(
        base_url=base_url,
        headers={"apiKey": api_key, "Content-Type": "application/json"},
        timeout=60,
    ) as client:
        # List datasets
        print("=== Available Datasets ===\n")
        resp = await client.get("/datasets")
        resp.raise_for_status()
        data = resp.json()
        dataset_list = data.get("datasets", [])

        if not dataset_list:
            print("No datasets found.")
            sys.exit(1)

        for ds in dataset_list:
            print(f"  • {ds['name']}  (status: {ds.get('status', 'unknown')})")

        ready = [ds for ds in dataset_list if ds.get("status") == "ready"]
        if not ready:
            print("\nNo datasets are ready for querying yet.")
            sys.exit(1)

        target = ready[0]["name"]
        query = "Summarize the key information in this dataset."

        print(f"\n=== Asking: '{query}' ===")
        print(f"    Dataset: {target}\n")

        resp = await client.post("/ask", json={
            "datasetNames": [target],
            "queryString": query,
        })
        resp.raise_for_status()
        result = resp.json()

        print(f"Answer:\n{result.get('responseString', 'No response')}\n")
        print(f"Query UID:       {result.get('queryUid', 'N/A')}")
        print(f"Conversation ID: {result.get('conversationId', 'N/A')}")

        docs = result.get("documents", [])
        if docs:
            print(f"\nSources ({len(docs)} documents):")
            for doc in docs[:5]:
                score = f"{doc.get('score', 0):.2f}"
                name = doc.get("filename") or doc.get("filepath", "unknown")
                print(f"  [{score}] {name}")


if __name__ == "__main__":
    asyncio.run(main())

Save it as first_call.py and run:

Bash

python first_call.py

Expected output:

Text Only

=== Available Datasets ===

  • financial-reports  (status: ready)
  • it-runbooks        (status: ready)

=== Asking: 'Summarize the key information in this dataset.' ===
    Dataset: financial-reports

Answer:
The financial-reports dataset contains quarterly expense reports, annual
summaries, and budget forecasts for fiscal years 2024-2025. Key highlights
include total annual revenue of $12.3M and operating expenses of $8.7M...

Query UID:       q-abc123-def456
Conversation ID: conv-789xyz

Sources (3 documents):
  [0.94] annual-summary-2025.pdf
  [0.89] Q4-2025-expenses.xlsx
  [0.82] budget-forecast-2026.pdf

What's Next?¶

You've successfully connected to Gaia and made your first RAG query. Here's where to go from here:

Understanding Datasets — Learn how datasets are created, indexed, and managed.
Architecture Overview — See how this fits into a full application.
Backend with FastAPI — Build a proper backend around these API calls.