Skip to content

Exhaustive Search

While the standard /ask endpoint retrieves a handful of the most relevant chunks and generates an LLM answer, exhaustive search scans the entire dataset and returns every document that matches your query — paginated, ranked by relevance, and without LLM synthesis.

Use exhaustive search when you need completeness over conciseness: compliance audits, e-discovery, data inventory, or any workflow where missing a single matching document is unacceptable.


POST /ask PUT /ask/exhaustive
Goal Get an LLM-generated answer Find all matching documents
Result responseString + top-k documents Paginated document list (no LLM answer)
Speed Fast (returns top-k only) Slower (scans full index)
Pagination No Yes (paginationToken)
Best for Q&A, summarization, chat Compliance, audits, bulk export

The Exhaustive Search Endpoint

PUT /ask/exhaustive

Request Body

Field Type Required Description
datasetName string Yes The dataset to search (single dataset per request).
queryString string Yes Natural-language query or keyword string.
pageSize int No Results per page (default: 20, max: 100).
paginationToken string No Token from a previous response to fetch the next page.
conversationId string No Associate this search with an existing conversation.

Response Body

Field Type Description
queryUid string Unique query identifier (used for download, feedback).
documents Document[] Array of matching documents with snippets and scores.
totalCount int Total number of matching documents across all pages.
paginationToken string Token to fetch the next page (null when no more pages).

Python
import asyncio
from gaia_sdk import GaiaClient

async def main():
    async with GaiaClient(api_key="YOUR_API_KEY") as gaia:
        result = await gaia.exhaustive_search(
            dataset_name="legal-contracts",
            query="indemnification clause",
            page_size=10,
        )

        print(f"Total matches: {result.total_count}")
        for doc in result.documents:
            print(f"  {doc.filename}  score={doc.score:.3f}")
            print(f"    {doc.snippet[:150]}\n")

asyncio.run(main())
Bash
curl -s -X PUT \
  -H "apiKey: $GAIA_API_KEY" \
  -H "Content-Type: application/json" \
  "https://helios.cohesity.com/v2/mcm/gaia/ask/exhaustive" \
  -d '{
    "datasetName": "legal-contracts",
    "queryString": "indemnification clause",
    "pageSize": 10
  }' | python -m json.tool

When totalCount exceeds pageSize, use paginationToken to iterate through all results.

Python
import asyncio
from gaia_sdk import GaiaClient

async def exhaustive_search_all(
    gaia: GaiaClient,
    dataset: str,
    query: str,
    page_size: int = 50,
):
    """Yield every matching document across all pages."""
    token = None

    while True:
        result = await gaia.exhaustive_search(
            dataset_name=dataset,
            query=query,
            page_size=page_size,
            pagination_token=token,
        )

        for doc in result.documents:
            yield doc

        token = result.pagination_token
        if not token:
            break

async def main():
    async with GaiaClient(api_key="YOUR_API_KEY") as gaia:
        count = 0
        async for doc in exhaustive_search_all(
            gaia, "legal-contracts", "indemnification clause"
        ):
            count += 1
            print(f"  [{count}] {doc.filename} (score: {doc.score:.3f})")

        print(f"\nTotal documents retrieved: {count}")

asyncio.run(main())

Async generator

The exhaustive_search_all helper above is an async generator — it lazily fetches pages only as you iterate, keeping memory usage constant regardless of total result count.


Bulk Download Workflow

For very large result sets, Gaia can package all matching documents into a downloadable ZIP archive. This is a three-step process:

Step 1: Initiate Download

Bash
curl -s -X POST \
  -H "apiKey: $GAIA_API_KEY" \
  -H "Content-Type: application/json" \
  "https://helios.cohesity.com/v2/mcm/gaia/ask/exhaustive/download/$QUERY_UID" \
  -d '{}'

$QUERY_UID is the queryUid from a previous exhaustive search response.

Step 2: Poll for Completion

The download preparation is asynchronous. Poll until the status is "ready":

Bash
curl -s \
  -H "apiKey: $GAIA_API_KEY" \
  "https://helios.cohesity.com/v2/mcm/gaia/ask/exhaustive/download/$QUERY_UID/status"

Response:

JSON
{
  "status": "ready",
  "downloadUrl": "https://helios.cohesity.com/v2/mcm/gaia/ask/exhaustive/download/abc-123/file",
  "fileCount": 247,
  "totalSizeBytes": 104857600
}

Step 3: Download the ZIP

Bash
curl -s -o results.zip \
  -H "apiKey: $GAIA_API_KEY" \
  "https://helios.cohesity.com/v2/mcm/gaia/ask/exhaustive/download/$QUERY_UID/file"

Complete Download Example (SDK)

Python
import asyncio
import httpx
from gaia_sdk import GaiaClient

async def download_exhaustive_results(
    gaia: GaiaClient,
    query_uid: str,
    output_path: str = "results.zip",
    poll_seconds: int = 5,
):
    """Initiate bulk download, poll for readiness, and save the ZIP."""

    # Step 1: Initiate
    await gaia._request(
        "POST", f"/ask/exhaustive/download/{query_uid}", json={}
    )
    print("Download initiated. Waiting for preparation …")

    # Step 2: Poll
    while True:
        status = await gaia._request(
            "GET", f"/ask/exhaustive/download/{query_uid}/status"
        )
        print(f"  Status: {status.get('status')}")

        if status.get("status") == "ready":
            break
        await asyncio.sleep(poll_seconds)

    # Step 3: Download
    download_url = f"/ask/exhaustive/download/{query_uid}/file"
    response = await gaia.client.get(download_url)
    with open(output_path, "wb") as f:
        f.write(response.content)

    print(f"Downloaded {status.get('fileCount', '?')} files to {output_path}")

async def main():
    async with GaiaClient(api_key="YOUR_API_KEY") as gaia:

        # Run the exhaustive search
        result = await gaia.exhaustive_search(
            dataset_name="legal-contracts",
            query="indemnification clause",
        )
        print(f"Found {result.total_count} documents.")

        # Download all results as a ZIP
        if result.query_uid:
            await download_exhaustive_results(gaia, result.query_uid)

asyncio.run(main())

Use Cases

Find every document mentioning a specific regulation or policy:

Python
result = await gaia.exhaustive_search(
    dataset_name="policy-docs",
    query="GDPR data retention requirements",
    page_size=100,
)
print(f"Found {result.total_count} documents referencing GDPR retention.")

E-Discovery

Identify all communications containing specific keywords for legal hold:

Python
result = await gaia.exhaustive_search(
    dataset_name="email-archive",
    query="Project Titan acquisition timeline",
    page_size=50,
)

Audit Trail

Verify that a specific procedure is documented across all runbooks:

Python
async for doc in exhaustive_search_all(
    gaia, "runbooks", "disaster recovery failover"
):
    print(f"  Found in: {doc.filepath}")

Response Structure Deep Dive

A typical exhaustive search response:

JSON
{
  "queryUid": "q-abc-123",
  "totalCount": 142,
  "paginationToken": "eyJwYWdlIjogMn0=",
  "documents": [
    {
      "docId": "doc-001",
      "filename": "master-services-agreement.pdf",
      "filepath": "/contracts/2024/master-services-agreement.pdf",
      "snippet": "… The Vendor shall indemnify and hold harmless the Client …",
      "score": 0.92,
      "metadata": {
        "sizeBytes": 245760,
        "lastModified": "2024-08-15T10:30:00Z"
      }
    },
    {
      "docId": "doc-002",
      "filename": "vendor-agreement-acme.docx",
      "filepath": "/contracts/2024/vendor-agreement-acme.docx",
      "snippet": "… indemnification obligations shall survive termination …",
      "score": 0.88,
      "metadata": {
        "sizeBytes": 184320,
        "lastModified": "2024-09-01T14:00:00Z"
      }
    }
  ]
}

paginationToken encoding

The token is an opaque, base64-encoded string. Do not decode or modify it — pass it directly to the next request.


Best Practices

Choose the right page size

Use pageSize=50 or pageSize=100 for batch processing. Smaller pages (10–20) are better for interactive UIs where you want fast initial results.

Stream results to storage

For very large result sets, write documents to disk or a database as you iterate rather than accumulating them all in memory.

Combine with refinement

After an exhaustive search, use refine() with selected docIds to get an LLM-generated summary of specific documents. This pairs the completeness of exhaustive search with the synthesis of RAG.

Single dataset per request

Unlike /ask, exhaustive search operates on one dataset at a time. To search across multiple datasets, issue parallel requests and merge results client-side.


What's Next