Helper LLM Integration¶
Gaia handles RAG — retrieval and answer generation — out of the box. But sometimes you need a secondary LLM for tasks that fall outside the RAG pipeline: evaluating answer quality, classifying questions, summarizing long results, extracting structured data, or routing queries to the right dataset. This chapter covers how to register external LLMs with Gaia and how to build a processing pipeline that combines Gaia's RAG with a helper LLM.
Architecture Pattern¶
┌─────────────────┐ ┌─────────────────┐ ┌──────────────────┐
│ User Question │────▶│ Gaia RAG │────▶│ Helper LLM │
│ │ │ (retrieval + │ │ (post-process) │
│ │ │ generation) │ │ │
│ │ │ │ │ - Evaluate │
│ │ │ answer + │ │ - Classify │
│ │ │ citations │ │ - Summarize │
│ │ │ │ │ - Extract │
│ │◀────│ │◀────│ │
│ Final Response │ │ │ │ │
└─────────────────┘ └─────────────────┘ └──────────────────┘
Gaia provides the data retrieval and initial answer generation. The helper LLM adds a second layer of intelligence — evaluating, transforming, or enriching the Gaia output before it reaches the user.
Registering External LLMs with Gaia¶
Gaia supports registering external LLMs that can be used as alternative generation models for the /ask endpoint or as helper processors.
API: POST /llms¶
Register a new LLM with Gaia:
Request body:
{
"type": "AzureOpenAI",
"name": "azure-gpt4o",
"url": "https://my-instance.openai.azure.com",
"apiKey": "your-azure-api-key",
"deploymentName": "gpt-4o",
"apiVersion": "2024-08-01-preview"
}
Supported LLM Types¶
| Type | Description | Required Fields |
|---|---|---|
AzureOpenAI | Azure-hosted OpenAI models (GPT-4o, GPT-4, etc.) | url, apiKey, deploymentName, apiVersion |
GoogleAI | Google AI models (Gemini, etc.) | url, apiKey |
LocalGaiaAI | Models running on the Cohesity cluster | — |
Listing Registered LLMs¶
{
"llms": [
{
"name": "DefaultLLM",
"type": "LocalGaiaAI"
},
{
"name": "azure-gpt4o",
"type": "AzureOpenAI",
"url": "https://my-instance.openai.azure.com"
}
]
}
Using a Registered LLM in Queries¶
Pass the llmName parameter to /ask or /ask/stream to use a specific registered LLM:
response = await gaia.ask(
dataset_names=["my-dataset"],
query="What was our Q4 revenue?",
llm_name="azure-gpt4o", # Use the registered Azure model
)
Using Azure OpenAI as a Helper LLM¶
The most common pattern is using Azure OpenAI as a helper LLM for post-processing. This runs alongside Gaia's RAG pipeline, not instead of it.
Configuration¶
Add Azure OpenAI credentials to your .env:
# Helper LLM (Azure OpenAI)
HELPER_LLM_ENABLED=true
HELPER_LLM_ENDPOINT=https://my-instance.openai.azure.com
HELPER_LLM_API_KEY=your-azure-api-key
HELPER_LLM_API_VERSION=2024-08-01-preview
HELPER_LLM_DEPLOYMENT=gpt-4o
HELPER_LLM_TIMEOUT_SECONDS=20
Settings¶
# In settings.py
class Settings(BaseSettings):
# ... existing settings ...
helper_llm_enabled: bool = Field(
default=False,
validation_alias="HELPER_LLM_ENABLED",
)
helper_llm_endpoint: str = Field(
default="",
validation_alias="HELPER_LLM_ENDPOINT",
)
helper_llm_api_key: str = Field(
default="",
validation_alias="HELPER_LLM_API_KEY",
)
helper_llm_api_version: str = Field(
default="2024-08-01-preview",
validation_alias="HELPER_LLM_API_VERSION",
)
helper_llm_deployment: str = Field(
default="",
validation_alias="HELPER_LLM_DEPLOYMENT",
)
helper_llm_timeout_seconds: float = Field(
default=20.0,
validation_alias="HELPER_LLM_TIMEOUT_SECONDS",
)
Helper LLM Service¶
A service module that wraps Azure OpenAI chat completions and provides structured JSON responses.
"""Helper LLM service for post-processing Gaia answers."""
import json
from typing import Any
import httpx
from backend.settings import get_settings
async def chat_completion(messages: list[dict[str, str]]) -> dict[str, Any]:
"""Call Azure OpenAI chat completion and return parsed JSON."""
settings = get_settings()
if not settings.helper_llm_endpoint or not settings.helper_llm_api_key:
raise RuntimeError("Helper LLM is not configured.")
endpoint = settings.helper_llm_endpoint.rstrip("/")
if not endpoint.endswith("/openai"):
endpoint = f"{endpoint}/openai"
url = f"{endpoint}/deployments/{settings.helper_llm_deployment}/chat/completions"
async with httpx.AsyncClient(
timeout=settings.helper_llm_timeout_seconds,
) as client:
response = await client.post(
url,
headers={
"api-key": settings.helper_llm_api_key,
"Content-Type": "application/json",
},
params={"api-version": settings.helper_llm_api_version},
json={
"messages": messages,
"response_format": {"type": "json_object"},
},
)
response.raise_for_status()
data = response.json()
content = data["choices"][0]["message"]["content"]
return json.loads(content)
Use Case: Answer Quality Evaluation¶
Evaluate whether a Gaia answer adequately addresses the user's question, and trigger deeper search if needed.
async def evaluate_answer(
question: str,
answer: str,
citations: list[dict],
) -> dict[str, Any]:
"""Score how well a Gaia answer addresses the question.
Returns:
{
"confidence": "low" | "medium" | "high",
"score": 0.0-1.0,
"needs_exhaustive_search": bool,
"reason": "..."
}
"""
settings = get_settings()
if not settings.helper_llm_enabled:
return {
"confidence": "medium",
"score": 0.5,
"needs_exhaustive_search": False,
"reason": "Helper LLM disabled; default heuristic applied.",
}
evidence = "\n".join(
f"{i}. {c.get('title', '')} — {c.get('snippet', '')[:200]}"
for i, c in enumerate(citations, 1)
) or "No citations provided."
messages = [
{
"role": "system",
"content": (
"You evaluate retrieval-based Q&A responses. "
"Score 0-1 how well the response answers the question. "
"Respond as JSON: {confidence, score, needs_exhaustive_search, reason}."
),
},
{
"role": "user",
"content": (
f"Question: {question}\n\n"
f"Answer: {answer}\n\n"
f"Evidence:\n{evidence}\n\n"
"Score this response."
),
},
]
try:
result = await chat_completion(messages)
except Exception:
return {
"confidence": "medium",
"score": 0.5,
"needs_exhaustive_search": False,
"reason": "Helper LLM call failed; fallback applied.",
}
return {
"confidence": str(result.get("confidence", "medium")),
"score": float(result.get("score", 0.5)),
"needs_exhaustive_search": bool(result.get("needs_exhaustive_search", False)),
"reason": str(result.get("reason", "")),
}
Use Case: Dataset Routing¶
When your app has multiple datasets, the helper LLM can match a user's question to the best dataset(s) using discovery metadata.
async def match_datasets(
question: str,
dataset_metadata: list[dict],
) -> list[str]:
"""Use the helper LLM to pick the best dataset(s) for a question.
dataset_metadata: list of dicts with keys: name, description, suggested_questions
Returns: list of dataset names, best match first.
"""
catalog = "\n".join(
f"- {d['name']}: {d.get('description', 'N/A')} "
f"(questions: {', '.join(d.get('suggested_questions', [])[:3])})"
for d in dataset_metadata
)
messages = [
{
"role": "system",
"content": (
"Match the user's question to the best dataset(s). "
"Respond as JSON: {\"dataset_names\": [\"name1\"]}"
),
},
{
"role": "user",
"content": f"Question: {question}\n\nDatasets:\n{catalog}",
},
]
try:
result = await chat_completion(messages)
names = result.get("dataset_names", [])
return [str(n) for n in names if n]
except Exception:
return []
Use Case: Structured Data Extraction¶
Extract structured information from Gaia's free-text answers.
async def extract_entities(
answer: str,
entity_types: list[str],
) -> dict[str, list[str]]:
"""Extract named entities from a Gaia answer.
entity_types: e.g. ["person", "organization", "date", "amount"]
Returns: {"person": ["John Doe"], "amount": ["$4.2B"]}
"""
messages = [
{
"role": "system",
"content": (
f"Extract entities of types {entity_types} from the text. "
"Respond as JSON where keys are entity types and values are arrays of strings."
),
},
{
"role": "user",
"content": answer,
},
]
try:
return await chat_completion(messages)
except Exception:
return {t: [] for t in entity_types}
Building a Processing Pipeline¶
Combine Gaia RAG with helper LLM steps into a multi-stage pipeline:
async def answer_with_evaluation(
question: str,
dataset_names: list[str],
gaia_client: GaiaClient,
) -> dict:
"""Full answer pipeline: Gaia RAG → Helper LLM evaluation → optional exhaustive search."""
# Stage 1: Get initial answer from Gaia
result = await gaia_client.ask(
dataset_names=dataset_names,
query=question,
)
citations = [
{
"title": doc.filename,
"snippet": doc.snippet,
"score": doc.score,
}
for doc in (result.documents or [])
]
# Stage 2: Evaluate with helper LLM
evaluation = await evaluate_answer(
question=question,
answer=result.response_string or "",
citations=citations,
)
# Stage 3: If evaluation is low-confidence, try exhaustive search
if evaluation.get("needs_exhaustive_search") and dataset_names:
exhaustive = await gaia_client.exhaustive_search(
dataset_name=dataset_names[0],
query=question,
page_size=20,
)
# Re-ask with the exhaustive results as context
if exhaustive.documents:
doc_ids = [d.doc_id for d in exhaustive.documents if d.doc_id][:5]
if doc_ids and result.query_uid:
refined = await gaia_client.refine(
query_uid=result.query_uid,
dataset_names=dataset_names,
query=question,
doc_ids=doc_ids,
)
return {
"answer": refined.response_string,
"queryUid": refined.query_uid,
"documents": refined.documents,
"evaluation": evaluation,
"refinedFromExhaustive": True,
}
return {
"answer": result.response_string,
"queryUid": result.query_uid,
"conversationId": result.conversation_id,
"documents": result.documents,
"evaluation": evaluation,
"refinedFromExhaustive": False,
}
Pipeline Flow Diagram¶
┌──────────────┐
│ User asks │
│ question │
└──────┬───────┘
│
┌──────▼───────┐
│ Gaia /ask │
│ (RAG) │
└──────┬───────┘
│
┌──────▼───────┐
│ Helper LLM │
┌─────│ Evaluate │─────┐
│ └──────────────┘ │
score ≥ 0.9 score < 0.9
│ │
┌────────▼───────┐ ┌──────────▼──────────┐
│ Return answer │ │ Exhaustive search │
│ as-is │ │ + Refine with top │
│ │ │ documents │
└────────────────┘ └──────────┬───────────┘
│
┌──────────▼───────────┐
│ Return refined answer │
└──────────────────────┘
Registering LLMs via the SDK¶
async with GaiaClient.from_env() as gaia:
# List registered LLMs
llms = await gaia.list_llms()
for llm in llms:
print(f"{llm['name']} ({llm['type']})")
To register a new LLM, use raw httpx (the SDK doesn't currently wrap this endpoint):
import httpx
async def register_llm(
api_key: str,
llm_type: str,
name: str,
url: str,
llm_api_key: str,
deployment_name: str | None = None,
api_version: str | None = None,
) -> dict:
"""Register an external LLM with Gaia."""
payload = {
"type": llm_type,
"name": name,
"url": url,
"apiKey": llm_api_key,
}
if deployment_name:
payload["deploymentName"] = deployment_name
if api_version:
payload["apiVersion"] = api_version
async with httpx.AsyncClient() as client:
response = await client.post(
"https://helios.cohesity.com/v2/mcm/gaia/llms",
headers={
"apiKey": api_key,
"Content-Type": "application/json",
},
json=payload,
)
response.raise_for_status()
return response.json()
# Example: Register Azure OpenAI GPT-4o
result = await register_llm(
api_key="your-gaia-api-key",
llm_type="AzureOpenAI",
name="azure-gpt4o",
url="https://my-instance.openai.azure.com",
llm_api_key="your-azure-key",
deployment_name="gpt-4o",
api_version="2024-08-01-preview",
)
Error Handling¶
Helper LLM calls should never block the main answer pipeline. If the helper LLM is unavailable, fall back gracefully:
try:
evaluation = await evaluate_answer(question, answer, citations)
except Exception as exc:
logger.warning("Helper LLM failed: %s", exc)
evaluation = {
"confidence": "medium",
"score": 0.5,
"needs_exhaustive_search": False,
"reason": "Helper LLM unavailable; default applied.",
}
Timeout independently
Set a shorter timeout for helper LLM calls (e.g., 20 seconds) than for Gaia calls (e.g., 60 seconds). A slow helper shouldn't hold up the entire response.
Next Steps¶
- Environment Configuration — Configure environment variables for deployment.
- Dataset Discovery — Use discovery metadata as input to dataset routing.
- Streaming Responses — Stream answers while evaluation happens in the background.