Skip to content

FAQ & Platform Reference

A quick-reference page covering Gaia platform limits, supported configurations, and frequently asked questions — sourced from the official Cohesity Gaia User Guide.


Platform Limits

Limit Value
Max datasets per user account 50
Max objects per dataset 1,000
Max extracted text per dataset 20 GB
Max file size 100 MB
Conversation retention 60 days (inactivity)
Max pinned conversations 15
Context window (follow-ups) Last 3 Q&A pairs per session
Topic Explorer refresh (continuous indexing) Every 7 days
Continuous indexing check interval Every 1 hour

Supported Object Types

Object Type SaaS Self-Managed
Microsoft 365 Mailbox
Microsoft 365 OneDrive
Microsoft 365 SharePoint Site
Isilon NAS
NetApp ONTAP
Generic NAS
Physical Servers
Views

Supported File Types

.doc .docx .xls .xlsx .ppt .pptx .pdf .odf .rtf .txt .html .xml


Cloud Regions (SaaS)

Cloud Provider Regions
AWS US-East-2 (Ohio), US-West-2 (Oregon), Canada (Central)
Microsoft Azure Central US (Iowa)

Available LLMs

Deployment Model Powered By
SaaS Cohesity Standard LLM Azure OpenAI GPT or Google Gemini 2.0
SaaS Cohesity Advanced LLM Azure OpenAI GPT or Google Gemini 2.0
Self-Managed Mistral 7B Runs locally on NVIDIA GPUs

Supported Languages

Language SaaS Chat SaaS Topic Explorer Self-Managed Chat Self-Managed Topic Explorer
English
Dutch
French
German
Italian
Japanese
Korean
Spanish

Note

UI elements, error messages, and system prompts remain in English regardless of query language.


Access Roles

Role Create Datasets Manage Users Chat Topic Explorer Recover Docs API Access
Gaia Admin Full
Gaia Viewer Read-only

Frequently Asked Questions

What is a dataset?

A dataset is a logical collection of data backed by Cohesity. It groups one or more data sources (mailboxes, OneDrive accounts, NAS shares, etc.) under a single queryable index. For example, a dataset could contain all OneDrive accounts for the Finance department.

An "object" in Gaia is a single mailbox or a single OneDrive account. If you include 2 mailboxes and 3 OneDrive accounts, your dataset has 5 objects.

How does Gaia handle hallucinations?

Gaia minimizes hallucinations by grounding responses in indexed data. When no valid matches are found, Gaia responds with "I could not find enough information to answer the question" and prompts users to rephrase or add more data. It does not fabricate answers from general knowledge.

What encryption does Gaia use?

SaaS: AES-256 encryption for data in-transit and at-rest. Encryption keys are managed by a robust KMS with key rotation and revocation support.

Self-Managed: AES-256 encryption for data in-transit. For data at-rest, Cohesity recommends enabling server-side encryption on your S3-compatible storage. Gaia does not perform client-side encryption.

Is the API the same for SaaS and Self-Managed?

Yes. The Gaia REST API is identical across both deployment models. Only the base URL changes — SaaS uses https://helios.cohesity.com/v2/mcm/gaia, while Self-Managed uses your Helios Self-Managed FQDN.

Can I use my own LLM?

SaaS: You can choose between Azure OpenAI GPT and Google Gemini 2.0 for both Standard and Advanced tiers. Custom models are not supported.

Self-Managed: Currently ships with Mistral 7B running locally. The platform may support additional models in future releases.

What happens when extracted text exceeds 20 GB?

The dataset's indexing status changes to "Warning". Partial data may be available for querying, but not all documents will be indexed. Consider splitting into multiple smaller datasets.

Can a dataset span multiple clusters?

No. A dataset can only include snapshots from a single Cohesity cluster. To query data across clusters, create separate datasets and pass multiple dataset names in your /ask request.

Is Topic Explorer permission-aware?

No. Topic Explorer shows all themes and content in a dataset regardless of the user's file-level permissions. It is restricted to users with the Gaia Admin role only.

How often does Continuous Indexing check for new data?

Gaia checks for new snapshots every hour. If a new snapshot is available, indexing begins automatically. You can also trigger on-demand indexing via the UI or API.

What is the context window for follow-up questions?

Gaia retains context from the last 3 questions and responses in the current conversation session. If you open a new tab or navigate away, Gaia starts a fresh session. Use the chat history API or "Load History" to restore context.

Can I override Answer Safety?

No. Answer Safety is enabled by default and cannot be disabled. It filters sexually explicit, violent, discriminatory, and illegal content in real-time. This is an always-on safeguard.

What network ports does Gaia Self-Managed need?

The Gaia AI Engine requires inbound TCP connectivity from both the Helios Appliance and Cohesity clusters (recommended port 443). Outbound TCP connectivity is needed to the S3 endpoint (recommended port 3000 for SmartFiles S3 Views, or the port used by your S3-compatible service).


Next Steps