Skip to content

What is Gaia?

Cohesity Gaia is an enterprise Retrieval-Augmented Generation (RAG) platform built directly into the Cohesity Data Cloud. It transforms the vast stores of enterprise data already managed by Cohesity — backups, file shares, databases, Microsoft 365, and more — into a queryable knowledge base powered by large language models (LLMs).

Instead of building bespoke data pipelines to ingest and prepare your organization's data for AI, Gaia leverages data that Cohesity already protects. This means your enterprise knowledge is instantly available for intelligent search and question-answering, without duplication, without new infrastructure, and with the same security controls your data already has.


How Gaia Works

At a high level, Gaia performs three jobs:

  1. Indexes enterprise data sources into a vector store optimized for semantic search.
  2. Retrieves the most relevant documents for a given query using vector similarity and keyword matching.
  3. Generates a natural-language answer by feeding the retrieved context to an LLM.

This is the RAG pattern — and Gaia handles every piece of it as a managed platform service.

Text Only
┌──────────────┐     ┌──────────────────┐     ┌──────────────┐     ┌─────────┐     ┌─────────────┐
│  Cohesity    │────▶│  Gaia Indexing    │────▶│  RAG Engine  │────▶│   LLM   │────▶│  Your App   │
│  Data Cloud  │     │  (Vector Store)   │     │  (Retrieval) │     │         │     │  (API Call)  │
└──────────────┘     └──────────────────┘     └──────────────┘     └─────────┘     └─────────────┘
     Backups              Chunks &                Semantic            Answer          Frontend /
     Files                Embeddings              Search +            Generation      Backend
     M365                                         Ranking
     Databases

Key Capabilities

Capability Description
Dataset Management Create, configure, and monitor named collections of indexed enterprise data.
RAG Queries (/ask) Ask natural-language questions and get LLM-generated answers grounded in your data.
Exhaustive Search (/ask/exhaustive) Retrieve all matching documents for a query with pagination — ideal for e-discovery.
Document Upload Upload ad-hoc files into Gaia for indexing alongside your existing datasets.
Conversation Management Multi-turn conversations with history so the LLM maintains context across questions.
Streaming Responses (/ask/stream) Server-Sent Events (SSE) streaming for real-time token-by-token answer delivery.
Refine (/ask/refine) Narrow an answer to specific documents for more focused, precise responses.
Feedback Provide thumbs-up/down feedback on answers to improve result quality over time.

Use Cases

Gaia unlocks AI-powered workflows across the enterprise:

Incident Investigation

Ask Gaia to search across backup snapshots, log files, and email archives to reconstruct a timeline of events. Instead of manually digging through terabytes of data, pose questions like "What changes were made to the finance database between March 1 and March 5?"

Compliance & E-Discovery

Use exhaustive search to find every document matching a legal query. Gaia returns paginated results across all indexed sources with document metadata, making it straightforward to build compliance review workflows.

Knowledge Management

Surface institutional knowledge locked in file shares, wikis, and collaboration tools. Employees can query Gaia to find answers from documentation, past incident reports, or project archives without knowing where the information lives.

Operational Intelligence

Correlate information across operational data — infrastructure configs, runbooks, ticketing systems — to answer questions like "Which servers were patched in the last maintenance window and what issues were reported?"

Customer Support

Build customer-facing or internal support tools that answer questions using product documentation, knowledge bases, and historical support tickets as their grounding data.


How Gaia is Different

Why not just build your own RAG pipeline?

You absolutely can — but here's what Gaia gives you out of the box:

Your data is already there. Cohesity manages your backups, file shares, and SaaS data. Gaia indexes this data in place. There are no ETL jobs to build, no data to copy, and no new storage to provision.

Enterprise security built-in. Gaia inherits the access controls, encryption, and audit logging of the Cohesity platform. API key authentication, optional security contexts for multi-tenancy, and role-based access are all native.

Managed infrastructure. The vector store, embedding pipeline, and LLM orchestration are all managed by the Cohesity cluster. You focus on building your application, not maintaining AI infrastructure.

Point-in-time data access. Because Gaia can index backup snapshots, you can query data as it existed at a specific point in time — a capability unique to backup-aware RAG.

Gaia vs. Generic RAG Frameworks

Frameworks like LangChain or LlamaIndex are excellent tools for building custom RAG pipelines, and they can even be used alongside Gaia. The difference is that Gaia provides the data layer — the indexed enterprise data, the vector store, and the retrieval engine — as a managed service. You can use Gaia's REST API directly or wrap it with any framework you prefer.


What You'll Build

This guide walks you through building a complete enterprise web application on top of Gaia. By the end, you'll have:

  • A Python backend (FastAPI) that proxies and enriches Gaia API calls
  • A React frontend with a modern chat-style interface
  • Streaming responses for real-time answer delivery
  • Session management so users authenticate once
  • Dataset browsing and selection
  • Conversation history with multi-turn context
  • Production-ready error handling, logging, and deployment configuration

Prerequisites

Before diving in, make sure you have the prerequisites covered. Head to Prerequisites to get your environment set up.


Next Steps