Your First API Call¶
Let's make your first call to the Gaia API. By the end of this page you'll have listed your available datasets and asked Gaia a question — all from a simple Python script.
Step 1: Set Up Your Environment¶
Create a .env file in your project root with your Gaia credentials:
Replace your-api-key-here
Use the actual API key you generated in Authentication. The placeholder value will not work.
Step 2: List Your Datasets¶
Datasets are the foundation of everything in Gaia — they represent the indexed data you can query. Let's start by listing what's available.
Endpoint: GET /datasets
Load your .env into the shell first so $GAIA_API_KEY is available:
Then list datasets:
import asyncio
import os
import httpx
from dotenv import load_dotenv
load_dotenv()
async def main():
async with httpx.AsyncClient(
base_url=os.environ["GAIA_BASE_URL"],
headers={"apiKey": os.environ["GAIA_API_KEY"]},
) as client:
response = await client.get("/datasets")
response.raise_for_status()
data = response.json()
# The /datasets endpoint returns {"datasets": [...]}
for ds in data.get("datasets", []):
print(f" {ds['name']} (status: {ds.get('status', 'unknown')})")
asyncio.run(main())
Expected output:
financial-reports (status: ready)
it-runbooks (status: ready)
hr-policies (status: indexing)
No datasets?
If you see an empty list, it means no datasets have been created and indexed on your Gaia instance yet. You can create one through the Cohesity UI or via the API — see Understanding Datasets.
Step 3: Ask a Question¶
Now let's ask Gaia a question against one of your datasets. This uses the RAG query endpoint.
Endpoint: POST /ask
Request body:
If you haven't already loaded your .env file into the shell, run: export $(grep -v '^#' .env | xargs)
import asyncio
import os
import httpx
from dotenv import load_dotenv
load_dotenv()
async def main():
async with httpx.AsyncClient(
base_url=os.environ["GAIA_BASE_URL"],
headers={
"apiKey": os.environ["GAIA_API_KEY"],
"Content-Type": "application/json",
},
) as client:
response = await client.post("/ask", json={
"datasetNames": ["financial-reports"],
"queryString": "What were the total expenses in Q4?",
})
response.raise_for_status()
data = response.json()
print("Answer:", data.get("responseString", ""))
print("Query UID:", data.get("queryUid", ""))
print("Conversation:", data.get("conversationId", ""))
print(f"Sources: {len(data.get('documents', []))} documents")
asyncio.run(main())
import asyncio
from gaia_sdk import GaiaClient
async def main():
async with GaiaClient.from_env() as gaia:
response = await gaia.ask(
dataset_names=["financial-reports"],
query="What were the total expenses in Q4?",
)
print("Answer:", response.response_string)
print("Query UID:", response.query_uid)
print("Conversation:", response.conversation_id)
if response.documents:
print(f"Sources: {len(response.documents)} documents")
for doc in response.documents[:3]:
print(f" - {doc.filename} (score: {doc.score})")
asyncio.run(main())
Step 4: Understand the Response¶
The /ask endpoint returns a JSON object with the following key fields:
{
"responseString": "Based on the financial reports, total Q4 expenses were $2.4M...",
"queryUid": "q-abc123-def456",
"conversationId": "conv-789xyz",
"conversationName": "Q4 Expenses",
"documents": [
{
"docId": "doc-001",
"filename": "Q4-2025-expenses.xlsx",
"filepath": "/finance/reports/Q4-2025-expenses.xlsx",
"snippet": "...total operating expenses of $2,400,000...",
"score": 0.94
},
{
"docId": "doc-002",
"filename": "annual-summary-2025.pdf",
"filepath": "/finance/reports/annual-summary-2025.pdf",
"snippet": "...fourth quarter expenses increased by 12%...",
"score": 0.87
}
],
"finishReason": "stop"
}
| Field | Description |
|---|---|
responseString | The LLM-generated answer, grounded in the retrieved documents |
queryUid | Unique identifier for this query — used for feedback and refine operations |
conversationId | Conversation identifier — pass this in subsequent queries to maintain context |
conversationName | Auto-generated name for the conversation |
documents | Source documents the answer was derived from, with relevance scores |
finishReason | Why the LLM stopped generating (stop = normal completion) |
Documents are your citations
The documents array contains the evidence that the LLM used to generate its answer. This is what makes RAG trustworthy — you can verify the answer against the source material.
Complete Runnable Script¶
Here's a self-contained script that brings it all together:
#!/usr/bin/env python3
"""Your first Gaia API call — list datasets and ask a question."""
import asyncio
import os
import sys
from dotenv import load_dotenv
load_dotenv()
async def main():
# Check for the SDK, fall back to raw httpx
try:
from gaia_sdk import GaiaClient
use_sdk = True
except ImportError:
import httpx
use_sdk = False
print("gaia_sdk not found — using httpx directly\n")
if use_sdk:
await run_with_sdk()
else:
await run_with_httpx()
async def run_with_sdk():
from gaia_sdk import GaiaClient
async with GaiaClient.from_env() as gaia:
# List datasets
print("=== Available Datasets ===\n")
datasets = await gaia.list_datasets()
if not datasets:
print("No datasets found. Create one in the Cohesity UI first.")
sys.exit(1)
for ds in datasets:
print(f" • {ds.name} (status: {ds.status})")
# Use the first ready dataset
ready = [ds for ds in datasets if ds.status == "ready"]
if not ready:
print("\nNo datasets are ready for querying yet.")
sys.exit(1)
target = ready[0].name
query = "Summarize the key information in this dataset."
print(f"\n=== Asking: '{query}' ===")
print(f" Dataset: {target}\n")
response = await gaia.ask(
dataset_names=[target],
query=query,
)
print(f"Answer:\n{response.response_string}\n")
print(f"Query UID: {response.query_uid}")
print(f"Conversation ID: {response.conversation_id}")
if response.documents:
print(f"\nSources ({len(response.documents)} documents):")
for doc in response.documents[:5]:
score = f"{doc.score:.2f}" if doc.score else "N/A"
print(f" [{score}] {doc.filename or doc.filepath}")
async def run_with_httpx():
import httpx
base_url = os.environ.get(
"GAIA_BASE_URL", "https://helios.cohesity.com/v2/mcm/gaia"
)
api_key = os.environ.get("GAIA_API_KEY")
if not api_key:
print("Error: GAIA_API_KEY environment variable is not set.")
sys.exit(1)
async with httpx.AsyncClient(
base_url=base_url,
headers={"apiKey": api_key, "Content-Type": "application/json"},
timeout=60,
) as client:
# List datasets
print("=== Available Datasets ===\n")
resp = await client.get("/datasets")
resp.raise_for_status()
data = resp.json()
dataset_list = data.get("datasets", [])
if not dataset_list:
print("No datasets found.")
sys.exit(1)
for ds in dataset_list:
print(f" • {ds['name']} (status: {ds.get('status', 'unknown')})")
ready = [ds for ds in dataset_list if ds.get("status") == "ready"]
if not ready:
print("\nNo datasets are ready for querying yet.")
sys.exit(1)
target = ready[0]["name"]
query = "Summarize the key information in this dataset."
print(f"\n=== Asking: '{query}' ===")
print(f" Dataset: {target}\n")
resp = await client.post("/ask", json={
"datasetNames": [target],
"queryString": query,
})
resp.raise_for_status()
result = resp.json()
print(f"Answer:\n{result.get('responseString', 'No response')}\n")
print(f"Query UID: {result.get('queryUid', 'N/A')}")
print(f"Conversation ID: {result.get('conversationId', 'N/A')}")
docs = result.get("documents", [])
if docs:
print(f"\nSources ({len(docs)} documents):")
for doc in docs[:5]:
score = f"{doc.get('score', 0):.2f}"
name = doc.get("filename") or doc.get("filepath", "unknown")
print(f" [{score}] {name}")
if __name__ == "__main__":
asyncio.run(main())
Save it as first_call.py and run:
Expected output:
=== Available Datasets ===
• financial-reports (status: ready)
• it-runbooks (status: ready)
=== Asking: 'Summarize the key information in this dataset.' ===
Dataset: financial-reports
Answer:
The financial-reports dataset contains quarterly expense reports, annual
summaries, and budget forecasts for fiscal years 2024-2025. Key highlights
include total annual revenue of $12.3M and operating expenses of $8.7M...
Query UID: q-abc123-def456
Conversation ID: conv-789xyz
Sources (3 documents):
[0.94] annual-summary-2025.pdf
[0.89] Q4-2025-expenses.xlsx
[0.82] budget-forecast-2026.pdf
What's Next?¶
You've successfully connected to Gaia and made your first RAG query. Here's where to go from here:
- Understanding Datasets — Learn how datasets are created, indexed, and managed.
- Architecture Overview — See how this fits into a full application.
- Backend with FastAPI — Build a proper backend around these API calls.