Skip to content

Helper LLM Integration

Gaia handles RAG — retrieval and answer generation — out of the box. But sometimes you need a secondary LLM for tasks that fall outside the RAG pipeline: evaluating answer quality, classifying questions, summarizing long results, extracting structured data, or routing queries to the right dataset. This chapter covers how to register external LLMs with Gaia and how to build a processing pipeline that combines Gaia's RAG with a helper LLM.


Architecture Pattern

Text Only
┌─────────────────┐     ┌─────────────────┐     ┌──────────────────┐
│  User Question   │────▶│  Gaia RAG       │────▶│  Helper LLM      │
│                  │     │  (retrieval +   │     │  (post-process)  │
│                  │     │   generation)   │     │                  │
│                  │     │                 │     │  - Evaluate      │
│                  │     │  answer +       │     │  - Classify      │
│                  │     │  citations      │     │  - Summarize     │
│                  │     │                 │     │  - Extract       │
│                  │◀────│                 │◀────│                  │
│  Final Response  │     │                 │     │                  │
└─────────────────┘     └─────────────────┘     └──────────────────┘

Gaia provides the data retrieval and initial answer generation. The helper LLM adds a second layer of intelligence — evaluating, transforming, or enriching the Gaia output before it reaches the user.


Registering External LLMs with Gaia

Gaia supports registering external LLMs that can be used as alternative generation models for the /ask endpoint or as helper processors.

API: POST /llms

Register a new LLM with Gaia:

Text Only
POST /v2/mcm/gaia/llms

Request body:

JSON
{
  "type": "AzureOpenAI",
  "name": "azure-gpt4o",
  "url": "https://my-instance.openai.azure.com",
  "apiKey": "your-azure-api-key",
  "deploymentName": "gpt-4o",
  "apiVersion": "2024-08-01-preview"
}

Supported LLM Types

Type Description Required Fields
AzureOpenAI Azure-hosted OpenAI models (GPT-4o, GPT-4, etc.) url, apiKey, deploymentName, apiVersion
GoogleAI Google AI models (Gemini, etc.) url, apiKey
LocalGaiaAI Models running on the Cohesity cluster

Listing Registered LLMs

Text Only
GET /v2/mcm/gaia/llms
JSON
{
  "llms": [
    {
      "name": "DefaultLLM",
      "type": "LocalGaiaAI"
    },
    {
      "name": "azure-gpt4o",
      "type": "AzureOpenAI",
      "url": "https://my-instance.openai.azure.com"
    }
  ]
}

Using a Registered LLM in Queries

Pass the llmName parameter to /ask or /ask/stream to use a specific registered LLM:

Python
response = await gaia.ask(
    dataset_names=["my-dataset"],
    query="What was our Q4 revenue?",
    llm_name="azure-gpt4o",  # Use the registered Azure model
)

Using Azure OpenAI as a Helper LLM

The most common pattern is using Azure OpenAI as a helper LLM for post-processing. This runs alongside Gaia's RAG pipeline, not instead of it.

Configuration

Add Azure OpenAI credentials to your .env:

Bash
# Helper LLM (Azure OpenAI)
HELPER_LLM_ENABLED=true
HELPER_LLM_ENDPOINT=https://my-instance.openai.azure.com
HELPER_LLM_API_KEY=your-azure-api-key
HELPER_LLM_API_VERSION=2024-08-01-preview
HELPER_LLM_DEPLOYMENT=gpt-4o
HELPER_LLM_TIMEOUT_SECONDS=20

Settings

Python
# In settings.py

class Settings(BaseSettings):
    # ... existing settings ...

    helper_llm_enabled: bool = Field(
        default=False,
        validation_alias="HELPER_LLM_ENABLED",
    )
    helper_llm_endpoint: str = Field(
        default="",
        validation_alias="HELPER_LLM_ENDPOINT",
    )
    helper_llm_api_key: str = Field(
        default="",
        validation_alias="HELPER_LLM_API_KEY",
    )
    helper_llm_api_version: str = Field(
        default="2024-08-01-preview",
        validation_alias="HELPER_LLM_API_VERSION",
    )
    helper_llm_deployment: str = Field(
        default="",
        validation_alias="HELPER_LLM_DEPLOYMENT",
    )
    helper_llm_timeout_seconds: float = Field(
        default=20.0,
        validation_alias="HELPER_LLM_TIMEOUT_SECONDS",
    )

Helper LLM Service

A service module that wraps Azure OpenAI chat completions and provides structured JSON responses.

Python
"""Helper LLM service for post-processing Gaia answers."""

import json
from typing import Any

import httpx

from backend.settings import get_settings


async def chat_completion(messages: list[dict[str, str]]) -> dict[str, Any]:
    """Call Azure OpenAI chat completion and return parsed JSON."""
    settings = get_settings()
    if not settings.helper_llm_endpoint or not settings.helper_llm_api_key:
        raise RuntimeError("Helper LLM is not configured.")

    endpoint = settings.helper_llm_endpoint.rstrip("/")
    if not endpoint.endswith("/openai"):
        endpoint = f"{endpoint}/openai"

    url = f"{endpoint}/deployments/{settings.helper_llm_deployment}/chat/completions"

    async with httpx.AsyncClient(
        timeout=settings.helper_llm_timeout_seconds,
    ) as client:
        response = await client.post(
            url,
            headers={
                "api-key": settings.helper_llm_api_key,
                "Content-Type": "application/json",
            },
            params={"api-version": settings.helper_llm_api_version},
            json={
                "messages": messages,
                "response_format": {"type": "json_object"},
            },
        )
        response.raise_for_status()

    data = response.json()
    content = data["choices"][0]["message"]["content"]
    return json.loads(content)

Use Case: Answer Quality Evaluation

Evaluate whether a Gaia answer adequately addresses the user's question, and trigger deeper search if needed.

Python
async def evaluate_answer(
    question: str,
    answer: str,
    citations: list[dict],
) -> dict[str, Any]:
    """Score how well a Gaia answer addresses the question.

    Returns:
        {
            "confidence": "low" | "medium" | "high",
            "score": 0.0-1.0,
            "needs_exhaustive_search": bool,
            "reason": "..."
        }
    """
    settings = get_settings()
    if not settings.helper_llm_enabled:
        return {
            "confidence": "medium",
            "score": 0.5,
            "needs_exhaustive_search": False,
            "reason": "Helper LLM disabled; default heuristic applied.",
        }

    evidence = "\n".join(
        f"{i}. {c.get('title', '')}{c.get('snippet', '')[:200]}"
        for i, c in enumerate(citations, 1)
    ) or "No citations provided."

    messages = [
        {
            "role": "system",
            "content": (
                "You evaluate retrieval-based Q&A responses. "
                "Score 0-1 how well the response answers the question. "
                "Respond as JSON: {confidence, score, needs_exhaustive_search, reason}."
            ),
        },
        {
            "role": "user",
            "content": (
                f"Question: {question}\n\n"
                f"Answer: {answer}\n\n"
                f"Evidence:\n{evidence}\n\n"
                "Score this response."
            ),
        },
    ]

    try:
        result = await chat_completion(messages)
    except Exception:
        return {
            "confidence": "medium",
            "score": 0.5,
            "needs_exhaustive_search": False,
            "reason": "Helper LLM call failed; fallback applied.",
        }

    return {
        "confidence": str(result.get("confidence", "medium")),
        "score": float(result.get("score", 0.5)),
        "needs_exhaustive_search": bool(result.get("needs_exhaustive_search", False)),
        "reason": str(result.get("reason", "")),
    }

Use Case: Dataset Routing

When your app has multiple datasets, the helper LLM can match a user's question to the best dataset(s) using discovery metadata.

Python
async def match_datasets(
    question: str,
    dataset_metadata: list[dict],
) -> list[str]:
    """Use the helper LLM to pick the best dataset(s) for a question.

    dataset_metadata: list of dicts with keys: name, description, suggested_questions
    Returns: list of dataset names, best match first.
    """
    catalog = "\n".join(
        f"- {d['name']}: {d.get('description', 'N/A')} "
        f"(questions: {', '.join(d.get('suggested_questions', [])[:3])})"
        for d in dataset_metadata
    )

    messages = [
        {
            "role": "system",
            "content": (
                "Match the user's question to the best dataset(s). "
                "Respond as JSON: {\"dataset_names\": [\"name1\"]}"
            ),
        },
        {
            "role": "user",
            "content": f"Question: {question}\n\nDatasets:\n{catalog}",
        },
    ]

    try:
        result = await chat_completion(messages)
        names = result.get("dataset_names", [])
        return [str(n) for n in names if n]
    except Exception:
        return []

Use Case: Structured Data Extraction

Extract structured information from Gaia's free-text answers.

Python
async def extract_entities(
    answer: str,
    entity_types: list[str],
) -> dict[str, list[str]]:
    """Extract named entities from a Gaia answer.

    entity_types: e.g. ["person", "organization", "date", "amount"]
    Returns: {"person": ["John Doe"], "amount": ["$4.2B"]}
    """
    messages = [
        {
            "role": "system",
            "content": (
                f"Extract entities of types {entity_types} from the text. "
                "Respond as JSON where keys are entity types and values are arrays of strings."
            ),
        },
        {
            "role": "user",
            "content": answer,
        },
    ]

    try:
        return await chat_completion(messages)
    except Exception:
        return {t: [] for t in entity_types}

Building a Processing Pipeline

Combine Gaia RAG with helper LLM steps into a multi-stage pipeline:

Python
async def answer_with_evaluation(
    question: str,
    dataset_names: list[str],
    gaia_client: GaiaClient,
) -> dict:
    """Full answer pipeline: Gaia RAG → Helper LLM evaluation → optional exhaustive search."""

    # Stage 1: Get initial answer from Gaia
    result = await gaia_client.ask(
        dataset_names=dataset_names,
        query=question,
    )

    citations = [
        {
            "title": doc.filename,
            "snippet": doc.snippet,
            "score": doc.score,
        }
        for doc in (result.documents or [])
    ]

    # Stage 2: Evaluate with helper LLM
    evaluation = await evaluate_answer(
        question=question,
        answer=result.response_string or "",
        citations=citations,
    )

    # Stage 3: If evaluation is low-confidence, try exhaustive search
    if evaluation.get("needs_exhaustive_search") and dataset_names:
        exhaustive = await gaia_client.exhaustive_search(
            dataset_name=dataset_names[0],
            query=question,
            page_size=20,
        )
        # Re-ask with the exhaustive results as context
        if exhaustive.documents:
            doc_ids = [d.doc_id for d in exhaustive.documents if d.doc_id][:5]
            if doc_ids and result.query_uid:
                refined = await gaia_client.refine(
                    query_uid=result.query_uid,
                    dataset_names=dataset_names,
                    query=question,
                    doc_ids=doc_ids,
                )
                return {
                    "answer": refined.response_string,
                    "queryUid": refined.query_uid,
                    "documents": refined.documents,
                    "evaluation": evaluation,
                    "refinedFromExhaustive": True,
                }

    return {
        "answer": result.response_string,
        "queryUid": result.query_uid,
        "conversationId": result.conversation_id,
        "documents": result.documents,
        "evaluation": evaluation,
        "refinedFromExhaustive": False,
    }

Pipeline Flow Diagram

Text Only
                    ┌──────────────┐
                    │ User asks    │
                    │ question     │
                    └──────┬───────┘
                    ┌──────▼───────┐
                    │  Gaia /ask   │
                    │  (RAG)       │
                    └──────┬───────┘
                    ┌──────▼───────┐
                    │ Helper LLM   │
              ┌─────│ Evaluate     │─────┐
              │     └──────────────┘     │
         score ≥ 0.9                score < 0.9
              │                          │
     ┌────────▼───────┐       ┌──────────▼──────────┐
     │ Return answer  │       │ Exhaustive search    │
     │ as-is          │       │ + Refine with top    │
     │                │       │   documents          │
     └────────────────┘       └──────────┬───────────┘
                              ┌──────────▼───────────┐
                              │ Return refined answer │
                              └──────────────────────┘

Registering LLMs via the SDK

Python
async with GaiaClient.from_env() as gaia:
    # List registered LLMs
    llms = await gaia.list_llms()
    for llm in llms:
        print(f"{llm['name']} ({llm['type']})")

To register a new LLM, use raw httpx (the SDK doesn't currently wrap this endpoint):

Python
import httpx


async def register_llm(
    api_key: str,
    llm_type: str,
    name: str,
    url: str,
    llm_api_key: str,
    deployment_name: str | None = None,
    api_version: str | None = None,
) -> dict:
    """Register an external LLM with Gaia."""
    payload = {
        "type": llm_type,
        "name": name,
        "url": url,
        "apiKey": llm_api_key,
    }
    if deployment_name:
        payload["deploymentName"] = deployment_name
    if api_version:
        payload["apiVersion"] = api_version

    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://helios.cohesity.com/v2/mcm/gaia/llms",
            headers={
                "apiKey": api_key,
                "Content-Type": "application/json",
            },
            json=payload,
        )
        response.raise_for_status()
        return response.json()


# Example: Register Azure OpenAI GPT-4o
result = await register_llm(
    api_key="your-gaia-api-key",
    llm_type="AzureOpenAI",
    name="azure-gpt4o",
    url="https://my-instance.openai.azure.com",
    llm_api_key="your-azure-key",
    deployment_name="gpt-4o",
    api_version="2024-08-01-preview",
)

Error Handling

Helper LLM calls should never block the main answer pipeline. If the helper LLM is unavailable, fall back gracefully:

Python
try:
    evaluation = await evaluate_answer(question, answer, citations)
except Exception as exc:
    logger.warning("Helper LLM failed: %s", exc)
    evaluation = {
        "confidence": "medium",
        "score": 0.5,
        "needs_exhaustive_search": False,
        "reason": "Helper LLM unavailable; default applied.",
    }

Timeout independently

Set a shorter timeout for helper LLM calls (e.g., 20 seconds) than for Gaia calls (e.g., 60 seconds). A slow helper shouldn't hold up the entire response.


Next Steps