Sensitive Data Handling¶

Enterprise data often contains personally identifiable information (PII), financial records, health data, and other sensitive content. Gaia provides a sensitive data policies system that can automatically detect, mask, or redact sensitive information in query results before it reaches your application.

How It Works¶

Text Only

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Indexed Data    │────▶│  Gaia RAG       │────▶│  Sensitive Data  │────▶ Your App
│  (may contain    │     │  Engine          │     │  Policy Engine   │     (masked)
│   PII, SSNs,    │     │  (retrieval +   │     │  (detect + mask) │
│   credit cards) │     │   generation)   │     │                  │
└─────────────────┘     └─────────────────┘     └─────────────────┘

When sensitive data policies are configured on a dataset:

Gaia retrieves relevant documents as usual.
Before generating the answer, the policy engine scans the retrieved context for sensitive patterns.
Detected sensitive data is masked or redacted according to the active policy.
The generated answer uses the masked context, so sensitive details never appear in the response.

Sensitive Data Categories¶

Gaia can detect and handle several categories of sensitive information:

Category	Examples	Masking Behavior
PII	Names, email addresses, phone numbers, physical addresses	Replaced with `[REDACTED-PII]`
Financial	Credit card numbers, bank accounts, routing numbers	Replaced with `[REDACTED-FINANCIAL]`
Government IDs	SSNs, passport numbers, driver's license numbers	Replaced with `[REDACTED-ID]`
Health	Medical record numbers, diagnoses, prescription info	Replaced with `[REDACTED-HEALTH]`
Credentials	API keys, passwords, tokens	Replaced with `[REDACTED-CREDENTIAL]`

Policy granularity

The exact categories and masking behavior depend on the policies available in your Gaia deployment. Use the List Policies API to see what's available.

API: `GET /sensitive-data/policies`¶

List all available sensitive data handling policies.

Request:

Text Only

GET /v2/mcm/gaia/sensitive-data/policies

Headers:

Text Only

apiKey: your-api-key

Response:

JSON

{
  "policies": [
    {
      "id": "pii-standard",
      "name": "Standard PII Protection",
      "description": "Detects and masks common PII including names, emails, phone numbers, and addresses.",
      "categories": ["PII"],
      "enabled": true
    },
    {
      "id": "financial-strict",
      "name": "Financial Data Protection",
      "description": "Detects and redacts financial account numbers, credit card numbers, and related data.",
      "categories": ["FINANCIAL", "GOVERNMENT_ID"],
      "enabled": true
    },
    {
      "id": "hipaa-compliance",
      "name": "HIPAA Compliance",
      "description": "Detects and redacts protected health information (PHI) as defined by HIPAA.",
      "categories": ["HEALTH", "PII"],
      "enabled": false
    }
  ]
}

Response Fields¶

Field	Type	Description
`id`	string	Unique policy identifier
`name`	string	Human-readable policy name
`description`	string	What the policy protects
`categories`	string[]	Sensitive data categories covered
`enabled`	boolean	Whether the policy is currently active

Backend Implementation¶

Listing Policies¶

Using the SDK:

Python

from fastapi import APIRouter, Depends
from gaia_sdk import GaiaClient
from backend.api.dependencies import get_gaia_client

router = APIRouter()


@router.get("/sensitive-data/policies", tags=["Sensitive Data"])
async def list_policies(
    client: GaiaClient = Depends(get_gaia_client),
):
    """List available sensitive data handling policies."""
    policies = await client.list_sensitive_data_policies()
    return {"policies": policies}

Using raw httpx:

Python

import httpx


async def list_sensitive_data_policies(api_key: str) -> list[dict]:
    """Fetch all sensitive data policies from Gaia."""
    async with httpx.AsyncClient() as client:
        response = await client.get(
            "https://helios.cohesity.com/v2/mcm/gaia/sensitive-data/policies",
            headers={"apiKey": api_key},
        )
        response.raise_for_status()
        data = response.json()
        return data.get("policies", [])

Displaying Policy Status¶

Build a simple admin panel showing which policies are active:

Python

@router.get("/admin/sensitive-data", tags=["Admin"])
async def sensitive_data_dashboard(
    client: GaiaClient = Depends(get_gaia_client),
):
    """Return a summary of sensitive data policy status."""
    policies = await client.list_sensitive_data_policies()
    return {
        "total_policies": len(policies),
        "active_policies": [p for p in policies if p.get("enabled")],
        "inactive_policies": [p for p in policies if not p.get("enabled")],
    }

Configuring Policies on Datasets¶

Sensitive data policies are typically configured at the dataset level through the Cohesity management interface or API. When creating or updating a dataset, you can specify which policies apply:

Python

async def configure_dataset_policies(
    api_key: str,
    dataset_name: str,
    policy_ids: list[str],
) -> dict:
    """Apply sensitive data policies to a dataset."""
    async with httpx.AsyncClient() as client:
        response = await client.put(
            f"https://helios.cohesity.com/v2/mcm/gaia/dataset/{dataset_name}/sensitive-data",
            headers={
                "apiKey": api_key,
                "Content-Type": "application/json",
            },
            json={"policyIds": policy_ids},
        )
        response.raise_for_status()
        return response.json()

Administrative operation

Configuring sensitive data policies requires administrative privileges on the Cohesity cluster. Regular API keys may not have permission to modify dataset policies. Contact your Cohesity administrator to enable policies.

Frontend: Policy Management UI¶

A read-only view of active policies, useful for compliance dashboards:

TSX

// src/components/SensitiveDataPolicies.tsx

import { useEffect, useState } from "react";
import { api } from "../api/client";

interface Policy {
  id: string;
  name: string;
  description: string;
  categories: string[];
  enabled: boolean;
}

export function SensitiveDataPolicies() {
  const [policies, setPolicies] = useState<Policy[]>([]);
  const [loading, setLoading] = useState(true);

  useEffect(() => {
    api
      .get<{ policies: Policy[] }>("/sensitive-data/policies")
      .then((data) => setPolicies(data.policies))
      .finally(() => setLoading(false));
  }, []);

  if (loading) return <p>Loading policies...</p>;

  return (
    <div className="space-y-4">
      <h2 className="text-lg font-bold">Sensitive Data Policies</h2>
      <div className="grid gap-3">
        {policies.map((policy) => (
          <div
            key={policy.id}
            className={`rounded-lg border p-4 ${
              policy.enabled
                ? "border-green-200 bg-green-50"
                : "border-gray-200 bg-gray-50"
            }`}
          >
            <div className="flex items-center justify-between">
              <h3 className="font-semibold">{policy.name}</h3>
              <span
                className={`rounded-full px-2 py-0.5 text-xs font-medium ${
                  policy.enabled
                    ? "bg-green-100 text-green-700"
                    : "bg-gray-100 text-gray-500"
                }`}
              >
                {policy.enabled ? "Active" : "Inactive"}
              </span>
            </div>
            <p className="text-sm text-gray-600 mt-1">{policy.description}</p>
            <div className="flex gap-1 mt-2">
              {policy.categories.map((cat) => (
                <span
                  key={cat}
                  className="rounded bg-gray-200 px-2 py-0.5 text-xs"
                >
                  {cat}
                </span>
              ))}
            </div>
          </div>
        ))}
      </div>
    </div>
  );
}

Best Practices¶

Defense in depth

Sensitive data policies are one layer of protection. Also consider:

Access control — Use Gaia's security context to restrict which users can query which datasets.
Audit logging — Log all queries and the datasets accessed for compliance review.
Data minimization — Only index datasets that your application actually needs.
Network security — Use HTTPS for all Gaia API calls and enforce TLS in production.

Enable policies before indexing — Policies work best when applied before data is indexed, so sensitive patterns are caught during ingestion.
Test with known PII — Upload test documents with known sensitive patterns to verify that masking works as expected.
Monitor policy coverage — Regularly review which policies are active and whether they cover all required data categories for your compliance requirements.
Don't rely solely on masking — Masking reduces exposure but isn't foolproof. Context clues in surrounding text may still reveal sensitive information. Combine with access controls and audit logging.

Next Steps¶

Helper LLM Integration — Use a secondary LLM for additional content processing.
Dataset Discovery — Explore what's in your datasets.
Error Handling — Handle policy-related errors.

Sensitive Data Handling¶

How It Works¶

Sensitive Data Categories¶

API: GET /sensitive-data/policies¶

Response Fields¶

Backend Implementation¶

Listing Policies¶

Displaying Policy Status¶

Configuring Policies on Datasets¶

Frontend: Policy Management UI¶

Best Practices¶

Next Steps¶

API: `GET /sensitive-data/policies`¶