FAQ & Platform Reference¶
A quick-reference page covering Gaia platform limits, supported configurations, and frequently asked questions — sourced from the official Cohesity Gaia User Guide.
Platform Limits¶
| Limit | Value |
|---|---|
| Max datasets per user account | 50 |
| Max objects per dataset | 1,000 |
| Max extracted text per dataset | 20 GB |
| Max file size | 100 MB |
| Conversation retention | 60 days (inactivity) |
| Max pinned conversations | 15 |
| Context window (follow-ups) | Last 3 Q&A pairs per session |
| Topic Explorer refresh (continuous indexing) | Every 7 days |
| Continuous indexing check interval | Every 1 hour |
Supported Object Types¶
| Object Type | SaaS | Self-Managed |
|---|---|---|
| Microsoft 365 Mailbox | ||
| Microsoft 365 OneDrive | ||
| Microsoft 365 SharePoint Site | ||
| Isilon NAS | ||
| NetApp ONTAP | ||
| Generic NAS | ||
| Physical Servers | ||
| Views |
Supported File Types¶
.doc .docx .xls .xlsx .ppt .pptx .pdf .odf .rtf .txt .html .xml
Cloud Regions (SaaS)¶
| Cloud Provider | Regions |
|---|---|
| AWS | US-East-2 (Ohio), US-West-2 (Oregon), Canada (Central) |
| Microsoft Azure | Central US (Iowa) |
Available LLMs¶
| Deployment | Model | Powered By |
|---|---|---|
| SaaS | Cohesity Standard LLM | Azure OpenAI GPT or Google Gemini 2.0 |
| SaaS | Cohesity Advanced LLM | Azure OpenAI GPT or Google Gemini 2.0 |
| Self-Managed | Mistral 7B | Runs locally on NVIDIA GPUs |
Supported Languages¶
| Language | SaaS Chat | SaaS Topic Explorer | Self-Managed Chat | Self-Managed Topic Explorer |
|---|---|---|---|---|
| English | ||||
| Dutch | ||||
| French | ||||
| German | ||||
| Italian | ||||
| Japanese | ||||
| Korean | ||||
| Spanish |
Note
UI elements, error messages, and system prompts remain in English regardless of query language.
Access Roles¶
| Role | Create Datasets | Manage Users | Chat | Topic Explorer | Recover Docs | API Access |
|---|---|---|---|---|---|---|
| Gaia Admin | Full | |||||
| Gaia Viewer | Read-only |
Frequently Asked Questions¶
What is a dataset?
A dataset is a logical collection of data backed by Cohesity. It groups one or more data sources (mailboxes, OneDrive accounts, NAS shares, etc.) under a single queryable index. For example, a dataset could contain all OneDrive accounts for the Finance department.
An "object" in Gaia is a single mailbox or a single OneDrive account. If you include 2 mailboxes and 3 OneDrive accounts, your dataset has 5 objects.
How does Gaia handle hallucinations?
Gaia minimizes hallucinations by grounding responses in indexed data. When no valid matches are found, Gaia responds with "I could not find enough information to answer the question" and prompts users to rephrase or add more data. It does not fabricate answers from general knowledge.
What encryption does Gaia use?
SaaS: AES-256 encryption for data in-transit and at-rest. Encryption keys are managed by a robust KMS with key rotation and revocation support.
Self-Managed: AES-256 encryption for data in-transit. For data at-rest, Cohesity recommends enabling server-side encryption on your S3-compatible storage. Gaia does not perform client-side encryption.
Is the API the same for SaaS and Self-Managed?
Yes. The Gaia REST API is identical across both deployment models. Only the base URL changes — SaaS uses https://helios.cohesity.com/v2/mcm/gaia, while Self-Managed uses your Helios Self-Managed FQDN.
Can I use my own LLM?
SaaS: You can choose between Azure OpenAI GPT and Google Gemini 2.0 for both Standard and Advanced tiers. Custom models are not supported.
Self-Managed: Currently ships with Mistral 7B running locally. The platform may support additional models in future releases.
What happens when extracted text exceeds 20 GB?
The dataset's indexing status changes to "Warning". Partial data may be available for querying, but not all documents will be indexed. Consider splitting into multiple smaller datasets.
Can a dataset span multiple clusters?
No. A dataset can only include snapshots from a single Cohesity cluster. To query data across clusters, create separate datasets and pass multiple dataset names in your /ask request.
Is Topic Explorer permission-aware?
No. Topic Explorer shows all themes and content in a dataset regardless of the user's file-level permissions. It is restricted to users with the Gaia Admin role only.
How often does Continuous Indexing check for new data?
Gaia checks for new snapshots every hour. If a new snapshot is available, indexing begins automatically. You can also trigger on-demand indexing via the UI or API.
What is the context window for follow-up questions?
Gaia retains context from the last 3 questions and responses in the current conversation session. If you open a new tab or navigate away, Gaia starts a fresh session. Use the chat history API or "Load History" to restore context.
Can I override Answer Safety?
No. Answer Safety is enabled by default and cannot be disabled. It filters sexually explicit, violent, discriminatory, and illegal content in real-time. This is an always-on safeguard.
What network ports does Gaia Self-Managed need?
The Gaia AI Engine requires inbound TCP connectivity from both the Helios Appliance and Cohesity clusters (recommended port 443). Outbound TCP connectivity is needed to the S3 endpoint (recommended port 3000 for SmartFiles S3 Views, or the port used by your S3-compatible service).
Next Steps¶
- What is Gaia? — Start from the beginning.
- Deployment Models — Compare SaaS and Self-Managed.
- Getting Started — Connect to the API.