Exhaustive Search¶
While the standard /ask endpoint retrieves a handful of the most relevant chunks and generates an LLM answer, exhaustive search scans the entire dataset and returns every document that matches your query — paginated, ranked by relevance, and without LLM synthesis.
Use exhaustive search when you need completeness over conciseness: compliance audits, e-discovery, data inventory, or any workflow where missing a single matching document is unacceptable.
Ask vs. Exhaustive Search¶
POST /ask | PUT /ask/exhaustive | |
|---|---|---|
| Goal | Get an LLM-generated answer | Find all matching documents |
| Result | responseString + top-k documents | Paginated document list (no LLM answer) |
| Speed | Fast (returns top-k only) | Slower (scans full index) |
| Pagination | No | Yes (paginationToken) |
| Best for | Q&A, summarization, chat | Compliance, audits, bulk export |
The Exhaustive Search Endpoint¶
PUT /ask/exhaustive¶
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
datasetName | string | Yes | The dataset to search (single dataset per request). |
queryString | string | Yes | Natural-language query or keyword string. |
pageSize | int | No | Results per page (default: 20, max: 100). |
paginationToken | string | No | Token from a previous response to fetch the next page. |
conversationId | string | No | Associate this search with an existing conversation. |
Response Body¶
| Field | Type | Description |
|---|---|---|
queryUid | string | Unique query identifier (used for download, feedback). |
documents | Document[] | Array of matching documents with snippets and scores. |
totalCount | int | Total number of matching documents across all pages. |
paginationToken | string | Token to fetch the next page (null when no more pages). |
Basic Exhaustive Search¶
import asyncio
from gaia_sdk import GaiaClient
async def main():
async with GaiaClient(api_key="YOUR_API_KEY") as gaia:
result = await gaia.exhaustive_search(
dataset_name="legal-contracts",
query="indemnification clause",
page_size=10,
)
print(f"Total matches: {result.total_count}")
for doc in result.documents:
print(f" {doc.filename} score={doc.score:.3f}")
print(f" {doc.snippet[:150]}…\n")
asyncio.run(main())
Paginated Search¶
When totalCount exceeds pageSize, use paginationToken to iterate through all results.
import asyncio
from gaia_sdk import GaiaClient
async def exhaustive_search_all(
gaia: GaiaClient,
dataset: str,
query: str,
page_size: int = 50,
):
"""Yield every matching document across all pages."""
token = None
while True:
result = await gaia.exhaustive_search(
dataset_name=dataset,
query=query,
page_size=page_size,
pagination_token=token,
)
for doc in result.documents:
yield doc
token = result.pagination_token
if not token:
break
async def main():
async with GaiaClient(api_key="YOUR_API_KEY") as gaia:
count = 0
async for doc in exhaustive_search_all(
gaia, "legal-contracts", "indemnification clause"
):
count += 1
print(f" [{count}] {doc.filename} (score: {doc.score:.3f})")
print(f"\nTotal documents retrieved: {count}")
asyncio.run(main())
Async generator
The exhaustive_search_all helper above is an async generator — it lazily fetches pages only as you iterate, keeping memory usage constant regardless of total result count.
Bulk Download Workflow¶
For very large result sets, Gaia can package all matching documents into a downloadable ZIP archive. This is a three-step process:
Step 1: Initiate Download¶
curl -s -X POST \
-H "apiKey: $GAIA_API_KEY" \
-H "Content-Type: application/json" \
"https://helios.cohesity.com/v2/mcm/gaia/ask/exhaustive/download/$QUERY_UID" \
-d '{}'
$QUERY_UID is the queryUid from a previous exhaustive search response.
Step 2: Poll for Completion¶
The download preparation is asynchronous. Poll until the status is "ready":
curl -s \
-H "apiKey: $GAIA_API_KEY" \
"https://helios.cohesity.com/v2/mcm/gaia/ask/exhaustive/download/$QUERY_UID/status"
Response:
{
"status": "ready",
"downloadUrl": "https://helios.cohesity.com/v2/mcm/gaia/ask/exhaustive/download/abc-123/file",
"fileCount": 247,
"totalSizeBytes": 104857600
}
Step 3: Download the ZIP¶
curl -s -o results.zip \
-H "apiKey: $GAIA_API_KEY" \
"https://helios.cohesity.com/v2/mcm/gaia/ask/exhaustive/download/$QUERY_UID/file"
Complete Download Example (SDK)¶
import asyncio
import httpx
from gaia_sdk import GaiaClient
async def download_exhaustive_results(
gaia: GaiaClient,
query_uid: str,
output_path: str = "results.zip",
poll_seconds: int = 5,
):
"""Initiate bulk download, poll for readiness, and save the ZIP."""
# Step 1: Initiate
await gaia._request(
"POST", f"/ask/exhaustive/download/{query_uid}", json={}
)
print("Download initiated. Waiting for preparation …")
# Step 2: Poll
while True:
status = await gaia._request(
"GET", f"/ask/exhaustive/download/{query_uid}/status"
)
print(f" Status: {status.get('status')}")
if status.get("status") == "ready":
break
await asyncio.sleep(poll_seconds)
# Step 3: Download
download_url = f"/ask/exhaustive/download/{query_uid}/file"
response = await gaia.client.get(download_url)
with open(output_path, "wb") as f:
f.write(response.content)
print(f"Downloaded {status.get('fileCount', '?')} files to {output_path}")
async def main():
async with GaiaClient(api_key="YOUR_API_KEY") as gaia:
# Run the exhaustive search
result = await gaia.exhaustive_search(
dataset_name="legal-contracts",
query="indemnification clause",
)
print(f"Found {result.total_count} documents.")
# Download all results as a ZIP
if result.query_uid:
await download_exhaustive_results(gaia, result.query_uid)
asyncio.run(main())
Use Cases¶
Compliance Search¶
Find every document mentioning a specific regulation or policy:
result = await gaia.exhaustive_search(
dataset_name="policy-docs",
query="GDPR data retention requirements",
page_size=100,
)
print(f"Found {result.total_count} documents referencing GDPR retention.")
E-Discovery¶
Identify all communications containing specific keywords for legal hold:
result = await gaia.exhaustive_search(
dataset_name="email-archive",
query="Project Titan acquisition timeline",
page_size=50,
)
Audit Trail¶
Verify that a specific procedure is documented across all runbooks:
async for doc in exhaustive_search_all(
gaia, "runbooks", "disaster recovery failover"
):
print(f" Found in: {doc.filepath}")
Response Structure Deep Dive¶
A typical exhaustive search response:
{
"queryUid": "q-abc-123",
"totalCount": 142,
"paginationToken": "eyJwYWdlIjogMn0=",
"documents": [
{
"docId": "doc-001",
"filename": "master-services-agreement.pdf",
"filepath": "/contracts/2024/master-services-agreement.pdf",
"snippet": "… The Vendor shall indemnify and hold harmless the Client …",
"score": 0.92,
"metadata": {
"sizeBytes": 245760,
"lastModified": "2024-08-15T10:30:00Z"
}
},
{
"docId": "doc-002",
"filename": "vendor-agreement-acme.docx",
"filepath": "/contracts/2024/vendor-agreement-acme.docx",
"snippet": "… indemnification obligations shall survive termination …",
"score": 0.88,
"metadata": {
"sizeBytes": 184320,
"lastModified": "2024-09-01T14:00:00Z"
}
}
]
}
paginationToken encoding
The token is an opaque, base64-encoded string. Do not decode or modify it — pass it directly to the next request.
Best Practices¶
Choose the right page size
Use pageSize=50 or pageSize=100 for batch processing. Smaller pages (10–20) are better for interactive UIs where you want fast initial results.
Stream results to storage
For very large result sets, write documents to disk or a database as you iterate rather than accumulating them all in memory.
Combine with refinement
After an exhaustive search, use refine() with selected docIds to get an LLM-generated summary of specific documents. This pairs the completeness of exhaustive search with the synthesis of RAG.
Single dataset per request
Unlike /ask, exhaustive search operates on one dataset at a time. To search across multiple datasets, issue parallel requests and merge results client-side.
What's Next¶
- Document Upload — add custom documents to your datasets.
- Refine & Feedback — generate focused answers from selected exhaustive search results.
- Querying & RAG — when you need an LLM answer instead of a document list.