Helper LLM Integration¶

Gaia handles RAG — retrieval and answer generation — out of the box. But sometimes you need a secondary LLM for tasks that fall outside the RAG pipeline: evaluating answer quality, classifying questions, summarizing long results, extracting structured data, or routing queries to the right dataset. This chapter covers how to register external LLMs with Gaia and how to build a processing pipeline that combines Gaia's RAG with a helper LLM.

Architecture Pattern¶

Text Only

┌─────────────────┐     ┌─────────────────┐     ┌──────────────────┐
│  User Question   │────▶│  Gaia RAG       │────▶│  Helper LLM      │
│                  │     │  (retrieval +   │     │  (post-process)  │
│                  │     │   generation)   │     │                  │
│                  │     │                 │     │  - Evaluate      │
│                  │     │  answer +       │     │  - Classify      │
│                  │     │  citations      │     │  - Summarize     │
│                  │     │                 │     │  - Extract       │
│                  │◀────│                 │◀────│                  │
│  Final Response  │     │                 │     │                  │
└─────────────────┘     └─────────────────┘     └──────────────────┘

Gaia provides the data retrieval and initial answer generation. The helper LLM adds a second layer of intelligence — evaluating, transforming, or enriching the Gaia output before it reaches the user.

Registering External LLMs with Gaia¶

Gaia supports registering external LLMs that can be used as alternative generation models for the /ask endpoint or as helper processors.

API: `POST /llms`¶

Register a new LLM with Gaia:

Text Only

POST /v2/mcm/gaia/llms

Request body:

JSON

{
  "type": "AzureOpenAI",
  "name": "azure-gpt4o",
  "url": "https://my-instance.openai.azure.com",
  "apiKey": "your-azure-api-key",
  "deploymentName": "gpt-4o",
  "apiVersion": "2024-08-01-preview"
}

Supported LLM Types¶

Type	Description	Required Fields
`AzureOpenAI`	Azure-hosted OpenAI models (GPT-4o, GPT-4, etc.)	`url`, `apiKey`, `deploymentName`, `apiVersion`
`GoogleAI`	Google AI models (Gemini, etc.)	`url`, `apiKey`
`LocalGaiaAI`	Models running on the Cohesity cluster	—

Listing Registered LLMs¶

Text Only

GET /v2/mcm/gaia/llms

JSON

{
  "llms": [
    {
      "name": "DefaultLLM",
      "type": "LocalGaiaAI"
    },
    {
      "name": "azure-gpt4o",
      "type": "AzureOpenAI",
      "url": "https://my-instance.openai.azure.com"
    }
  ]
}

Using a Registered LLM in Queries¶

Pass the llmName parameter to /ask or /ask/stream to use a specific registered LLM:

Python

response = await gaia.ask(
    dataset_names=["my-dataset"],
    query="What was our Q4 revenue?",
    llm_name="azure-gpt4o",  # Use the registered Azure model
)

Using Azure OpenAI as a Helper LLM¶

The most common pattern is using Azure OpenAI as a helper LLM for post-processing. This runs alongside Gaia's RAG pipeline, not instead of it.

Configuration¶

Add Azure OpenAI credentials to your .env:

Bash

# Helper LLM (Azure OpenAI)
HELPER_LLM_ENABLED=true
HELPER_LLM_ENDPOINT=https://my-instance.openai.azure.com
HELPER_LLM_API_KEY=your-azure-api-key
HELPER_LLM_API_VERSION=2024-08-01-preview
HELPER_LLM_DEPLOYMENT=gpt-4o
HELPER_LLM_TIMEOUT_SECONDS=20

Settings¶

Python

# In settings.py

class Settings(BaseSettings):
    # ... existing settings ...

    helper_llm_enabled: bool = Field(
        default=False,
        validation_alias="HELPER_LLM_ENABLED",
    )
    helper_llm_endpoint: str = Field(
        default="",
        validation_alias="HELPER_LLM_ENDPOINT",
    )
    helper_llm_api_key: str = Field(
        default="",
        validation_alias="HELPER_LLM_API_KEY",
    )
    helper_llm_api_version: str = Field(
        default="2024-08-01-preview",
        validation_alias="HELPER_LLM_API_VERSION",
    )
    helper_llm_deployment: str = Field(
        default="",
        validation_alias="HELPER_LLM_DEPLOYMENT",
    )
    helper_llm_timeout_seconds: float = Field(
        default=20.0,
        validation_alias="HELPER_LLM_TIMEOUT_SECONDS",
    )

Helper LLM Service¶

A service module that wraps Azure OpenAI chat completions and provides structured JSON responses.

Python

"""Helper LLM service for post-processing Gaia answers."""

import json
from typing import Any

import httpx

from backend.settings import get_settings


async def chat_completion(messages: list[dict[str, str]]) -> dict[str, Any]:
    """Call Azure OpenAI chat completion and return parsed JSON."""
    settings = get_settings()
    if not settings.helper_llm_endpoint or not settings.helper_llm_api_key:
        raise RuntimeError("Helper LLM is not configured.")

    endpoint = settings.helper_llm_endpoint.rstrip("/")
    if not endpoint.endswith("/openai"):
        endpoint = f"{endpoint}/openai"

    url = f"{endpoint}/deployments/{settings.helper_llm_deployment}/chat/completions"

    async with httpx.AsyncClient(
        timeout=settings.helper_llm_timeout_seconds,
    ) as client:
        response = await client.post(
            url,
            headers={
                "api-key": settings.helper_llm_api_key,
                "Content-Type": "application/json",
            },
            params={"api-version": settings.helper_llm_api_version},
            json={
                "messages": messages,
                "response_format": {"type": "json_object"},
            },
        )
        response.raise_for_status()

    data = response.json()
    content = data["choices"][0]["message"]["content"]
    return json.loads(content)

Use Case: Answer Quality Evaluation¶

Evaluate whether a Gaia answer adequately addresses the user's question, and trigger deeper search if needed.

Python

async def evaluate_answer(
    question: str,
    answer: str,
    citations: list[dict],
) -> dict[str, Any]:
    """Score how well a Gaia answer addresses the question.

    Returns:
        {
            "confidence": "low" | "medium" | "high",
            "score": 0.0-1.0,
            "needs_exhaustive_search": bool,
            "reason": "..."
        }
    """
    settings = get_settings()
    if not settings.helper_llm_enabled:
        return {
            "confidence": "medium",
            "score": 0.5,
            "needs_exhaustive_search": False,
            "reason": "Helper LLM disabled; default heuristic applied.",
        }

    evidence = "\n".join(
        f"{i}. {c.get('title', '')} — {c.get('snippet', '')[:200]}"
        for i, c in enumerate(citations, 1)
    ) or "No citations provided."

    messages = [
        {
            "role": "system",
            "content": (
                "You evaluate retrieval-based Q&A responses. "
                "Score 0-1 how well the response answers the question. "
                "Respond as JSON: {confidence, score, needs_exhaustive_search, reason}."
            ),
        },
        {
            "role": "user",
            "content": (
                f"Question: {question}\n\n"
                f"Answer: {answer}\n\n"
                f"Evidence:\n{evidence}\n\n"
                "Score this response."
            ),
        },
    ]

    try:
        result = await chat_completion(messages)
    except Exception:
        return {
            "confidence": "medium",
            "score": 0.5,
            "needs_exhaustive_search": False,
            "reason": "Helper LLM call failed; fallback applied.",
        }

    return {
        "confidence": str(result.get("confidence", "medium")),
        "score": float(result.get("score", 0.5)),
        "needs_exhaustive_search": bool(result.get("needs_exhaustive_search", False)),
        "reason": str(result.get("reason", "")),
    }

Use Case: Dataset Routing¶

When your app has multiple datasets, the helper LLM can match a user's question to the best dataset(s) using discovery metadata.

Python

async def match_datasets(
    question: str,
    dataset_metadata: list[dict],
) -> list[str]:
    """Use the helper LLM to pick the best dataset(s) for a question.

    dataset_metadata: list of dicts with keys: name, description, suggested_questions
    Returns: list of dataset names, best match first.
    """
    catalog = "\n".join(
        f"- {d['name']}: {d.get('description', 'N/A')} "
        f"(questions: {', '.join(d.get('suggested_questions', [])[:3])})"
        for d in dataset_metadata
    )

    messages = [
        {
            "role": "system",
            "content": (
                "Match the user's question to the best dataset(s). "
                "Respond as JSON: {\"dataset_names\": [\"name1\"]}"
            ),
        },
        {
            "role": "user",
            "content": f"Question: {question}\n\nDatasets:\n{catalog}",
        },
    ]

    try:
        result = await chat_completion(messages)
        names = result.get("dataset_names", [])
        return [str(n) for n in names if n]
    except Exception:
        return []

Use Case: Structured Data Extraction¶

Extract structured information from Gaia's free-text answers.

Python

async def extract_entities(
    answer: str,
    entity_types: list[str],
) -> dict[str, list[str]]:
    """Extract named entities from a Gaia answer.

    entity_types: e.g. ["person", "organization", "date", "amount"]
    Returns: {"person": ["John Doe"], "amount": ["$4.2B"]}
    """
    messages = [
        {
            "role": "system",
            "content": (
                f"Extract entities of types {entity_types} from the text. "
                "Respond as JSON where keys are entity types and values are arrays of strings."
            ),
        },
        {
            "role": "user",
            "content": answer,
        },
    ]

    try:
        return await chat_completion(messages)
    except Exception:
        return {t: [] for t in entity_types}

Building a Processing Pipeline¶

Combine Gaia RAG with helper LLM steps into a multi-stage pipeline:

Python

async def answer_with_evaluation(
    question: str,
    dataset_names: list[str],
    gaia_client: GaiaClient,
) -> dict:
    """Full answer pipeline: Gaia RAG → Helper LLM evaluation → optional exhaustive search."""

    # Stage 1: Get initial answer from Gaia
    result = await gaia_client.ask(
        dataset_names=dataset_names,
        query=question,
    )

    citations = [
        {
            "title": doc.filename,
            "snippet": doc.snippet,
            "score": doc.score,
        }
        for doc in (result.documents or [])
    ]

    # Stage 2: Evaluate with helper LLM
    evaluation = await evaluate_answer(
        question=question,
        answer=result.response_string or "",
        citations=citations,
    )

    # Stage 3: If evaluation is low-confidence, try exhaustive search
    if evaluation.get("needs_exhaustive_search") and dataset_names:
        exhaustive = await gaia_client.exhaustive_search(
            dataset_name=dataset_names[0],
            query=question,
            page_size=20,
        )
        # Re-ask with the exhaustive results as context
        if exhaustive.documents:
            doc_ids = [d.doc_id for d in exhaustive.documents if d.doc_id][:5]
            if doc_ids and result.query_uid:
                refined = await gaia_client.refine(
                    query_uid=result.query_uid,
                    dataset_names=dataset_names,
                    query=question,
                    doc_ids=doc_ids,
                )
                return {
                    "answer": refined.response_string,
                    "queryUid": refined.query_uid,
                    "documents": refined.documents,
                    "evaluation": evaluation,
                    "refinedFromExhaustive": True,
                }

    return {
        "answer": result.response_string,
        "queryUid": result.query_uid,
        "conversationId": result.conversation_id,
        "documents": result.documents,
        "evaluation": evaluation,
        "refinedFromExhaustive": False,
    }

Pipeline Flow Diagram¶

Text Only

                    ┌──────────────┐
                    │ User asks    │
                    │ question     │
                    └──────┬───────┘
                           │
                    ┌──────▼───────┐
                    │  Gaia /ask   │
                    │  (RAG)       │
                    └──────┬───────┘
                           │
                    ┌──────▼───────┐
                    │ Helper LLM   │
              ┌─────│ Evaluate     │─────┐
              │     └──────────────┘     │
         score ≥ 0.9                score < 0.9
              │                          │
     ┌────────▼───────┐       ┌──────────▼──────────┐
     │ Return answer  │       │ Exhaustive search    │
     │ as-is          │       │ + Refine with top    │
     │                │       │   documents          │
     └────────────────┘       └──────────┬───────────┘
                                         │
                              ┌──────────▼───────────┐
                              │ Return refined answer │
                              └──────────────────────┘

Registering LLMs via the SDK¶

Python

async with GaiaClient.from_env() as gaia:
    # List registered LLMs
    llms = await gaia.list_llms()
    for llm in llms:
        print(f"{llm['name']} ({llm['type']})")

To register a new LLM, use raw httpx (the SDK doesn't currently wrap this endpoint):

Python

import httpx


async def register_llm(
    api_key: str,
    llm_type: str,
    name: str,
    url: str,
    llm_api_key: str,
    deployment_name: str | None = None,
    api_version: str | None = None,
) -> dict:
    """Register an external LLM with Gaia."""
    payload = {
        "type": llm_type,
        "name": name,
        "url": url,
        "apiKey": llm_api_key,
    }
    if deployment_name:
        payload["deploymentName"] = deployment_name
    if api_version:
        payload["apiVersion"] = api_version

    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://helios.cohesity.com/v2/mcm/gaia/llms",
            headers={
                "apiKey": api_key,
                "Content-Type": "application/json",
            },
            json=payload,
        )
        response.raise_for_status()
        return response.json()


# Example: Register Azure OpenAI GPT-4o
result = await register_llm(
    api_key="your-gaia-api-key",
    llm_type="AzureOpenAI",
    name="azure-gpt4o",
    url="https://my-instance.openai.azure.com",
    llm_api_key="your-azure-key",
    deployment_name="gpt-4o",
    api_version="2024-08-01-preview",
)

Error Handling¶

Helper LLM calls should never block the main answer pipeline. If the helper LLM is unavailable, fall back gracefully:

Python

try:
    evaluation = await evaluate_answer(question, answer, citations)
except Exception as exc:
    logger.warning("Helper LLM failed: %s", exc)
    evaluation = {
        "confidence": "medium",
        "score": 0.5,
        "needs_exhaustive_search": False,
        "reason": "Helper LLM unavailable; default applied.",
    }

Timeout independently

Set a shorter timeout for helper LLM calls (e.g., 20 seconds) than for Gaia calls (e.g., 60 seconds). A slow helper shouldn't hold up the entire response.

Next Steps¶

Environment Configuration — Configure environment variables for deployment.
Dataset Discovery — Use discovery metadata as input to dataset routing.
Streaming Responses — Stream answers while evaluation happens in the background.

Helper LLM Integration¶

Architecture Pattern¶

Registering External LLMs with Gaia¶

API: POST /llms¶

Supported LLM Types¶

Listing Registered LLMs¶

Using a Registered LLM in Queries¶

Using Azure OpenAI as a Helper LLM¶

Configuration¶

Settings¶

Helper LLM Service¶

Use Case: Answer Quality Evaluation¶

Use Case: Dataset Routing¶

Use Case: Structured Data Extraction¶

Building a Processing Pipeline¶

Pipeline Flow Diagram¶

Registering LLMs via the SDK¶

Error Handling¶

Next Steps¶

API: `POST /llms`¶