Skip to content

SDK Reference

The gaia_sdk Python package provides a lightweight, async client for the Cohesity Gaia RAG API. It handles authentication, request serialization, error mapping, and SSE streaming — so you can focus on building your application logic.


Installation

Install the SDK from the local package:

Bash
pip install -e sdk/python/

Or add it to your requirements.txt:

Text Only
-e ./sdk/python

Dependencies

The SDK depends on httpx for async HTTP and pydantic for data models. Both are installed automatically.


Quick Start

Python
import asyncio
from gaia_sdk import GaiaClient

async def main():
    async with GaiaClient(api_key="your-api-key") as gaia:
        datasets = await gaia.list_datasets()
        response = await gaia.ask(
            dataset_names=[datasets[0].name],
            query="What are the key findings?",
        )
        print(response.response_string)

asyncio.run(main())

Or initialize from environment variables:

Python
async with GaiaClient.from_env() as gaia:
    ...

GaiaClient Methods

Constructor

Parameter Type Default Description
api_key str \| None None Gaia API key. Falls back to GAIA_API_KEY env var.
base_url str \| None None API base URL. Falls back to GAIA_BASE_URL env var.
timeout int 60 Request timeout in seconds.
verify_ssl bool True Verify SSL certificates.
security_context str \| None None Multi-tenant security context.

Datasets

Method Signature Description
list_datasets (prefix: str \| None = None) → list[Dataset] List available datasets, optionally filtered by name prefix.
get_dataset (name: str) → DatasetDetails Get detailed information about a specific dataset.
create_dataset (name: str, **kwargs) → dict Create a new dataset.
delete_dataset (name: str) → dict Delete a dataset.
trigger_indexing (name: str) → dict Trigger indexing for a dataset.

RAG Queries

Method Signature Description
ask (dataset_names: list[str], query: str, conversation_id: str \| None, llm_name: str \| None) → AskResponse Synchronous RAG query. Returns the full response.
ask_stream (dataset_names: list[str], query: str, ...) → StreamResult Streaming RAG query. Accumulates and returns the complete result.
ask_stream_iter (dataset_names: list[str], query: str, ...) → AsyncIterator[StreamChunk] Streaming RAG query. Yields individual SSE chunks for real-time display.

Search & Refine

Method Signature Description
exhaustive_search (dataset_name: str, query: str, page_size: int, pagination_token: str \| None, conversation_id: str \| None) → ExhaustiveSearchResponse Paginated document search across a dataset.
refine (query_uid: str, dataset_names: list[str], query: str, doc_ids: list[str]) → RefineResponse Refine a previous answer using specific documents.
search_similar_parts (dataset_name: str, query: str, **kwargs) → dict Semantic chunk retrieval for similar document parts.

Feedback

Method Signature Description
send_feedback (query_uid: str, is_good: bool, feedback_text: str \| None) → dict Submit thumbs-up/down feedback on a query response.

Document Upload

Method Signature Description
create_upload_session () → UploadSession Create a new upload session for grouping file uploads.
upload_file (session_id: str, file_path: str \| Path, file_name: str \| None) → dict Upload a file to an existing upload session.

Discovery & Conversations

Method Signature Description
get_discovery (dataset_id: str) → dict Get discovery results (document hierarchy) for a dataset.
list_conversations () → list[dict] List all conversations.
get_chat_history (conversation_id: str) → list[dict] Get the message history for a conversation.
delete_conversation (conversation_id: str) → dict Delete a conversation and its messages.

LLMs & Policies

Method Signature Description
list_llms () → list[dict] List registered LLMs available for queries.
list_sensitive_data_policies () → list[dict] List sensitive data handling policies.

Data Models

The SDK uses Pydantic models for type-safe request and response handling:

Model Purpose
AskRequest Request body for RAG queries
AskResponse Response from synchronous RAG queries
Dataset Dataset summary (name, status)
DatasetDetails Full dataset information (documents, indexing status)
Document Document metadata (ID, name, source)
ExhaustiveSearchRequest Request body for exhaustive search
ExhaustiveSearchResponse Paginated search results
RefineRequest Request body for answer refinement
RefineResponse Refined answer response
UploadSession Upload session metadata

Exception Hierarchy

All SDK exceptions inherit from GaiaError:

Text Only
GaiaError
├── GaiaAuthError          # 401 — Invalid or missing API key
├── GaiaNotFoundError      # 404 — Dataset or resource not found
├── GaiaRateLimitError     # 429 — Rate limit exceeded
├── GaiaServerError        # 500+ — Gaia server error
└── GaiaTimeoutError       # Request timed out
Python
from gaia_sdk import GaiaClient, GaiaAuthError, GaiaRateLimitError

async with GaiaClient.from_env() as gaia:
    try:
        response = await gaia.ask(["my-dataset"], "What happened?")
    except GaiaAuthError:
        print("Check your GAIA_API_KEY")
    except GaiaRateLimitError:
        print("Too many requests — back off and retry")

Environment Variables

Variable Required Default Description
GAIA_API_KEY Yes Your Cohesity Gaia API key
GAIA_BASE_URL No https://helios.cohesity.com/v2/mcm/gaia Gaia API base URL
GAIA_VERIFY_SSL No true Enable SSL certificate verification
GAIA_SECURITY_CTX No Security context for multi-tenant operations

Detailed Documentation

For the full SDK README with additional examples and contribution guidelines:

SDK README


Next Steps