The Gaia Client Library¶

The gaia_sdk package is a lightweight, async Python client for the Cohesity Gaia REST API. It wraps httpx under the hood and provides typed Pydantic models, an exception hierarchy, and SSE streaming helpers — everything you need to talk to Gaia from your FastAPI backend.

Installation¶

The SDK is included in the sdk/python/ directory of this project. Install it as an editable package so you can import it anywhere in your backend:

Bash

pip install -e sdk/python/

This installs the gaia_sdk package along with its dependencies (httpx, pydantic).

Why editable install?

An editable install (-e) creates a symlink so that changes to the SDK source are immediately available without re-installing. This is ideal during development. For production, you can install normally with pip install sdk/python/.

Creating a Client¶

With an explicit API key¶

Python

from gaia_sdk import GaiaClient

client = GaiaClient(
    api_key="your-api-key",
    base_url="https://helios.cohesity.com/v2/mcm/gaia",  # optional, this is the default
    timeout=60,           # request timeout in seconds
    verify_ssl=True,      # disable for self-signed certs in dev
)

From environment variables¶

Python

from gaia_sdk import GaiaClient

# Reads GAIA_API_KEY, GAIA_BASE_URL, GAIA_VERIFY_SSL, GAIA_SECURITY_CTX
client = GaiaClient.from_env()

Environment Variable	Required	Default
`GAIA_API_KEY`	Yes	—
`GAIA_BASE_URL`	No	`https://helios.cohesity.com/v2/mcm/gaia`
`GAIA_VERIFY_SSL`	No	`true`
`GAIA_SECURITY_CTX`	No	—

Async Context Manager Pattern¶

GaiaClient must be used as an async context manager. The __aenter__ method creates the underlying httpx.AsyncClient with preconfigured headers and timeouts. The __aexit__ method closes the connection pool.

Python

async with GaiaClient(api_key="your-key") as gaia:
    datasets = await gaia.list_datasets()
    answer = await gaia.ask(["my-dataset"], "What happened yesterday?")

Don't forget async with

Calling methods without entering the context manager raises RuntimeError: GaiaClient must be used as an async context manager. The client enforces this to prevent resource leaks.

Integrating with FastAPI Dependencies¶

The recommended pattern is a FastAPI dependency that yields a client within its context manager:

Python

from fastapi import Depends
from gaia_sdk import GaiaClient


async def get_gaia_client(api_key: str = Depends(get_session_api_key)):
    client = GaiaClient(api_key=api_key)
    async with client as c:
        yield c

FastAPI manages the generator lifecycle — entering the context on request start and exiting it when the response is sent.

Available Methods¶

Dataset Operations¶

`list_datasets(prefix=None)`¶

List all datasets accessible to the authenticated user, optionally filtered by name prefix.

Python

async with GaiaClient.from_env() as gaia:
    # All datasets
    datasets = await gaia.list_datasets()

    # Filtered by prefix
    sales_datasets = await gaia.list_datasets(prefix="sales")

    for ds in datasets:
        print(f"{ds.name} — {ds.status} — {ds.object_count} objects")

Returns a list of Dataset objects with fields: name, status, description, created_at, updated_at, object_count.

`get_dataset(name)`¶

Get detailed information about a specific dataset.

Python

details = await gaia.get_dataset("my-dataset")
print(details.name, details.indexing_stats)

`create_dataset(name, **kwargs)` / `delete_dataset(name)`¶

Python

await gaia.create_dataset("new-dataset", description="Quarterly reports")
await gaia.delete_dataset("old-dataset")

`trigger_indexing(name)`¶

Manually trigger indexing for a dataset.

Python

await gaia.trigger_indexing("my-dataset")

RAG Queries¶

`ask(dataset_names, query, conversation_id=None, llm_name=None)`¶

Send a synchronous RAG query. Blocks until the full answer is generated.

Python

response = await gaia.ask(
    dataset_names=["quarterly-reports"],
    query="What was our revenue in Q4 2024?",
    conversation_id="conv-123",  # optional, for multi-turn
)

print(response.response_string)   # The generated answer
print(response.query_uid)         # Unique ID for this query
print(response.conversation_id)   # Conversation ID (created or continued)

for doc in response.documents or []:
    print(f"  {doc.filename} (score: {doc.score})")

`ask_stream(dataset_names, query, ...)`¶

Send a streaming query and get the accumulated result after the stream completes. Useful when you want streaming on the Gaia side for performance but don't need to display tokens incrementally.

Python

result = await gaia.ask_stream(
    dataset_names=["my-dataset"],
    query="Summarize the incident report",
)

print(result.full_text)       # Complete answer text
print(result.documents)       # List of source documents
print(result.query_uid)       # Query UID for feedback/refine
print(result.finish_reason)   # "stop", "length", etc.

Returns a StreamResult with fields: full_text, documents, query_uid, conversation_id, finish_reason.

`ask_stream_iter(dataset_names, query, ...)`¶

Send a streaming query and yield individual SSE chunks as they arrive. This is what you use to proxy streaming responses to your frontend.

Python

async for chunk in gaia.ask_stream_iter(
    dataset_names=["my-dataset"],
    query="Explain our security posture",
):
    print(f"[{chunk.event}] {chunk.data}")
    if chunk.parsed:
        # chunk.parsed is the JSON payload (dict)
        if "responseString" in chunk.parsed:
            print("Token:", chunk.parsed["responseString"])

Each StreamChunk has: event (str), data (raw string), parsed (dict or None).

Exhaustive Search¶

`exhaustive_search(dataset_name, query, page_size=20, pagination_token=None)`¶

Retrieve all documents matching a query with pagination. Unlike ask, this does not generate an LLM answer — it returns raw document matches.

Python

result = await gaia.exhaustive_search(
    dataset_name="legal-docs",
    query="data breach notification",
    page_size=50,
)

print(f"Found {result.total_count} documents")
for doc in result.documents or []:
    print(f"  {doc.filename}: {doc.snippet[:100]}")

# Paginate
if result.pagination_token:
    next_page = await gaia.exhaustive_search(
        dataset_name="legal-docs",
        query="data breach notification",
        page_size=50,
        pagination_token=result.pagination_token,
    )

Refine¶

`refine(query_uid, dataset_names, query, doc_ids)`¶

Re-generate an answer using only specific documents. Use this after a user reviews the source documents from an initial query and selects the most relevant ones.

Python

refined = await gaia.refine(
    query_uid="original-query-uid",
    dataset_names=["my-dataset"],
    query="What was our Q4 revenue?",
    doc_ids=["doc-1", "doc-2"],
)

print(refined.response_string)

Feedback¶

`send_feedback(query_uid, is_good, feedback_text=None)`¶

Send thumbs-up/down feedback on a query response.

Python

await gaia.send_feedback(
    query_uid="abc-123",
    is_good=True,
    feedback_text="Accurate and well-sourced answer.",
)

Conversations¶

Python

# List all conversations
convos = await gaia.list_conversations()

# Get history for a conversation
messages = await gaia.get_chat_history("conv-123")

# Delete a conversation
await gaia.delete_conversation("conv-123")

Document Upload¶

Python

# Create an upload session
session = await gaia.create_upload_session()
print(session.upload_session_id)

# Upload a file
result = await gaia.upload_file(
    session_id=session.upload_session_id,
    file_path="./reports/q4-summary.pdf",
)

Dataset Discovery¶

Python

discovery = await gaia.get_discovery("my-dataset")
# Returns a dict with categories, topics, and suggested questions

Sensitive Data Policies¶

Python

policies = await gaia.list_sensitive_data_policies()
for policy in policies:
    print(policy)

LLMs¶

Python

llms = await gaia.list_llms()
for llm in llms:
    print(llm["name"], llm["type"])

Error Handling¶

The SDK raises typed exceptions that map directly to HTTP status codes from the Gaia API:

Exception	HTTP Status	When
`GaiaAuthError`	401, 403	Invalid or expired API key
`GaiaNotFoundError`	404	Dataset or resource doesn't exist
`GaiaRateLimitError`	429	Too many requests
`GaiaServerError`	5xx	Gaia internal error
`GaiaTimeoutError`	—	Request timed out
`GaiaError`	4xx (other)	Any other client error

All exceptions inherit from GaiaError and carry status_code and response_body attributes:

Python

from gaia_sdk import GaiaClient
from gaia_sdk.exceptions import GaiaAuthError, GaiaNotFoundError, GaiaError

async with GaiaClient(api_key="bad-key") as gaia:
    try:
        await gaia.list_datasets()
    except GaiaAuthError as e:
        print(f"Auth failed ({e.status_code}): {e}")
    except GaiaNotFoundError as e:
        print(f"Not found ({e.status_code}): {e}")
    except GaiaError as e:
        print(f"Gaia error ({e.status_code}): {e}")
        print(f"Response body: {e.response_body}")

Catch specific exceptions first

Since all exceptions inherit from GaiaError, always catch specific types (GaiaAuthError, GaiaNotFoundError) before the base GaiaError. See Error Handling for patterns on mapping these to HTTP responses in your backend.

Streaming Internals¶

The SDK's streaming support is built on two layers:

parse_sse_line(line) — Parses a single SSE text line into (field, value) tuples.
async_accumulate_stream(response) — Reads an entire httpx streaming response and returns a StreamResult.

The ask_stream() method uses async_accumulate_stream internally. The ask_stream_iter() method yields raw StreamChunk objects for fine-grained control.

Python

from gaia_sdk.streaming import StreamChunk, StreamResult

# StreamChunk fields
chunk = StreamChunk(
    event="message",             # SSE event type
    data='{"responseString": "The revenue was..."}',  # raw data
    parsed={"responseString": "The revenue was..."},   # parsed JSON (or None)
)

# StreamResult fields (accumulated from all chunks)
result = StreamResult(
    full_text="The revenue was $4.2B in Q4...",
    documents=[{"docId": "doc-1", "filename": "q4-report.pdf"}],
    query_uid="abc-123",
    conversation_id="conv-456",
    finish_reason="stop",
)

For more on streaming patterns, see Streaming Responses.

Next Steps¶

Frontend with React — Build the frontend that calls your backend.
Session Management — Implement the session store used by dependencies.
Streaming Responses — Advanced streaming patterns.

The Gaia Client Library¶

Installation¶

Creating a Client¶

With an explicit API key¶

From environment variables¶

Async Context Manager Pattern¶

Integrating with FastAPI Dependencies¶

Available Methods¶

Dataset Operations¶

list_datasets(prefix=None)¶

get_dataset(name)¶

create_dataset(name, **kwargs) / delete_dataset(name)¶

trigger_indexing(name)¶

RAG Queries¶

ask(dataset_names, query, conversation_id=None, llm_name=None)¶

ask_stream(dataset_names, query, ...)¶

ask_stream_iter(dataset_names, query, ...)¶

Exhaustive Search¶

exhaustive_search(dataset_name, query, page_size=20, pagination_token=None)¶

Refine¶

refine(query_uid, dataset_names, query, doc_ids)¶

Feedback¶

send_feedback(query_uid, is_good, feedback_text=None)¶

Conversations¶

Document Upload¶

Dataset Discovery¶

Sensitive Data Policies¶

LLMs¶

Error Handling¶

Streaming Internals¶

Next Steps¶

`list_datasets(prefix=None)`¶

`get_dataset(name)`¶

`create_dataset(name, **kwargs)` / `delete_dataset(name)`¶

`trigger_indexing(name)`¶

`ask(dataset_names, query, conversation_id=None, llm_name=None)`¶

`ask_stream(dataset_names, query, ...)`¶

`ask_stream_iter(dataset_names, query, ...)`¶

`exhaustive_search(dataset_name, query, page_size=20, pagination_token=None)`¶

`refine(query_uid, dataset_names, query, doc_ids)`¶

`send_feedback(query_uid, is_good, feedback_text=None)`¶