Exhaustive Search¶

While the standard /ask endpoint retrieves a handful of the most relevant chunks and generates an LLM answer, exhaustive search scans the entire dataset and returns every document that matches your query — paginated, ranked by relevance, and without LLM synthesis.

Use exhaustive search when you need completeness over conciseness: compliance audits, e-discovery, data inventory, or any workflow where missing a single matching document is unacceptable.

Ask vs. Exhaustive Search¶

	`POST /ask`	`PUT /ask/exhaustive`
Goal	Get an LLM-generated answer	Find all matching documents
Result	`responseString` + top-k documents	Paginated document list (no LLM answer)
Speed	Fast (returns top-k only)	Slower (scans full index)
Pagination	No	Yes (`paginationToken`)
Best for	Q&A, summarization, chat	Compliance, audits, bulk export

The Exhaustive Search Endpoint¶

`PUT /ask/exhaustive`¶

Request Body¶

Field	Type	Required	Description
`datasetName`	`string`	Yes	The dataset to search (single dataset per request).
`queryString`	`string`	Yes	Natural-language query or keyword string.
`pageSize`	`int`	No	Results per page (default: `20`, max: `100`).
`paginationToken`	`string`	No	Token from a previous response to fetch the next page.
`conversationId`	`string`	No	Associate this search with an existing conversation.

Response Body¶

Field	Type	Description
`queryUid`	`string`	Unique query identifier (used for download, feedback).
`documents`	`Document[]`	Array of matching documents with snippets and scores.
`totalCount`	`int`	Total number of matching documents across all pages.
`paginationToken`	`string`	Token to fetch the next page (`null` when no more pages).

Basic Exhaustive Search¶

Python SDKcURL

Python

import asyncio
from gaia_sdk import GaiaClient

async def main():
    async with GaiaClient(api_key="YOUR_API_KEY") as gaia:
        result = await gaia.exhaustive_search(
            dataset_name="legal-contracts",
            query="indemnification clause",
            page_size=10,
        )

        print(f"Total matches: {result.total_count}")
        for doc in result.documents:
            print(f"  {doc.filename}  score={doc.score:.3f}")
            print(f"    {doc.snippet[:150]}…\n")

asyncio.run(main())

Bash

curl -s -X PUT \
  -H "apiKey: $GAIA_API_KEY" \
  -H "Content-Type: application/json" \
  "https://helios.cohesity.com/v2/mcm/gaia/ask/exhaustive" \
  -d '{
    "datasetName": "legal-contracts",
    "queryString": "indemnification clause",
    "pageSize": 10
  }' | python -m json.tool

Paginated Search¶

When totalCount exceeds pageSize, use paginationToken to iterate through all results.

Python

import asyncio
from gaia_sdk import GaiaClient

async def exhaustive_search_all(
    gaia: GaiaClient,
    dataset: str,
    query: str,
    page_size: int = 50,
):
    """Yield every matching document across all pages."""
    token = None

    while True:
        result = await gaia.exhaustive_search(
            dataset_name=dataset,
            query=query,
            page_size=page_size,
            pagination_token=token,
        )

        for doc in result.documents:
            yield doc

        token = result.pagination_token
        if not token:
            break

async def main():
    async with GaiaClient(api_key="YOUR_API_KEY") as gaia:
        count = 0
        async for doc in exhaustive_search_all(
            gaia, "legal-contracts", "indemnification clause"
        ):
            count += 1
            print(f"  [{count}] {doc.filename} (score: {doc.score:.3f})")

        print(f"\nTotal documents retrieved: {count}")

asyncio.run(main())

Async generator

The exhaustive_search_all helper above is an async generator — it lazily fetches pages only as you iterate, keeping memory usage constant regardless of total result count.

Bulk Download Workflow¶

For very large result sets, Gaia can package all matching documents into a downloadable ZIP archive. This is a three-step process:

Step 1: Initiate Download¶

Bash

curl -s -X POST \
  -H "apiKey: $GAIA_API_KEY" \
  -H "Content-Type: application/json" \
  "https://helios.cohesity.com/v2/mcm/gaia/ask/exhaustive/download/$QUERY_UID" \
  -d '{}'

$QUERY_UID is the queryUid from a previous exhaustive search response.

Step 2: Poll for Completion¶

The download preparation is asynchronous. Poll until the status is "ready":

Bash

curl -s \
  -H "apiKey: $GAIA_API_KEY" \
  "https://helios.cohesity.com/v2/mcm/gaia/ask/exhaustive/download/$QUERY_UID/status"

Response:

JSON

{
  "status": "ready",
  "downloadUrl": "https://helios.cohesity.com/v2/mcm/gaia/ask/exhaustive/download/abc-123/file",
  "fileCount": 247,
  "totalSizeBytes": 104857600
}

Step 3: Download the ZIP¶

Bash

curl -s -o results.zip \
  -H "apiKey: $GAIA_API_KEY" \
  "https://helios.cohesity.com/v2/mcm/gaia/ask/exhaustive/download/$QUERY_UID/file"

Complete Download Example (SDK)¶

Python

import asyncio
import httpx
from gaia_sdk import GaiaClient

async def download_exhaustive_results(
    gaia: GaiaClient,
    query_uid: str,
    output_path: str = "results.zip",
    poll_seconds: int = 5,
):
    """Initiate bulk download, poll for readiness, and save the ZIP."""

    # Step 1: Initiate
    await gaia._request(
        "POST", f"/ask/exhaustive/download/{query_uid}", json={}
    )
    print("Download initiated. Waiting for preparation …")

    # Step 2: Poll
    while True:
        status = await gaia._request(
            "GET", f"/ask/exhaustive/download/{query_uid}/status"
        )
        print(f"  Status: {status.get('status')}")

        if status.get("status") == "ready":
            break
        await asyncio.sleep(poll_seconds)

    # Step 3: Download
    download_url = f"/ask/exhaustive/download/{query_uid}/file"
    response = await gaia.client.get(download_url)
    with open(output_path, "wb") as f:
        f.write(response.content)

    print(f"Downloaded {status.get('fileCount', '?')} files to {output_path}")

async def main():
    async with GaiaClient(api_key="YOUR_API_KEY") as gaia:

        # Run the exhaustive search
        result = await gaia.exhaustive_search(
            dataset_name="legal-contracts",
            query="indemnification clause",
        )
        print(f"Found {result.total_count} documents.")

        # Download all results as a ZIP
        if result.query_uid:
            await download_exhaustive_results(gaia, result.query_uid)

asyncio.run(main())

Use Cases¶

Compliance Search¶

Find every document mentioning a specific regulation or policy:

Python

result = await gaia.exhaustive_search(
    dataset_name="policy-docs",
    query="GDPR data retention requirements",
    page_size=100,
)
print(f"Found {result.total_count} documents referencing GDPR retention.")

E-Discovery¶

Identify all communications containing specific keywords for legal hold:

Python

result = await gaia.exhaustive_search(
    dataset_name="email-archive",
    query="Project Titan acquisition timeline",
    page_size=50,
)

Audit Trail¶

Verify that a specific procedure is documented across all runbooks:

Python

async for doc in exhaustive_search_all(
    gaia, "runbooks", "disaster recovery failover"
):
    print(f"  Found in: {doc.filepath}")

Response Structure Deep Dive¶

A typical exhaustive search response:

JSON

{
  "queryUid": "q-abc-123",
  "totalCount": 142,
  "paginationToken": "eyJwYWdlIjogMn0=",
  "documents": [
    {
      "docId": "doc-001",
      "filename": "master-services-agreement.pdf",
      "filepath": "/contracts/2024/master-services-agreement.pdf",
      "snippet": "… The Vendor shall indemnify and hold harmless the Client …",
      "score": 0.92,
      "metadata": {
        "sizeBytes": 245760,
        "lastModified": "2024-08-15T10:30:00Z"
      }
    },
    {
      "docId": "doc-002",
      "filename": "vendor-agreement-acme.docx",
      "filepath": "/contracts/2024/vendor-agreement-acme.docx",
      "snippet": "… indemnification obligations shall survive termination …",
      "score": 0.88,
      "metadata": {
        "sizeBytes": 184320,
        "lastModified": "2024-09-01T14:00:00Z"
      }
    }
  ]
}

paginationToken encoding

The token is an opaque, base64-encoded string. Do not decode or modify it — pass it directly to the next request.

Best Practices¶

Choose the right page size

Use pageSize=50 or pageSize=100 for batch processing. Smaller pages (10–20) are better for interactive UIs where you want fast initial results.

Stream results to storage

For very large result sets, write documents to disk or a database as you iterate rather than accumulating them all in memory.

Combine with refinement

After an exhaustive search, use refine() with selected docIds to get an LLM-generated summary of specific documents. This pairs the completeness of exhaustive search with the synthesis of RAG.

Single dataset per request

Unlike /ask, exhaustive search operates on one dataset at a time. To search across multiple datasets, issue parallel requests and merge results client-side.

What's Next¶

Document Upload — add custom documents to your datasets.
Refine & Feedback — generate focused answers from selected exhaustive search results.
Querying & RAG — when you need an LLM answer instead of a document list.

Exhaustive Search¶

Ask vs. Exhaustive Search¶

The Exhaustive Search Endpoint¶

PUT /ask/exhaustive¶

Request Body¶

Response Body¶

Basic Exhaustive Search¶

Paginated Search¶

Bulk Download Workflow¶

Step 1: Initiate Download¶

Step 2: Poll for Completion¶

Step 3: Download the ZIP¶

Complete Download Example (SDK)¶

Use Cases¶

Compliance Search¶

E-Discovery¶

Audit Trail¶

Response Structure Deep Dive¶

Best Practices¶

What's Next¶

`PUT /ask/exhaustive`¶