Skip to content

Sensitive Data Handling

Enterprise data often contains personally identifiable information (PII), financial records, health data, and other sensitive content. Gaia provides a sensitive data policies system that can automatically detect, mask, or redact sensitive information in query results before it reaches your application.


How It Works

Text Only
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Indexed Data    │────▶│  Gaia RAG       │────▶│  Sensitive Data  │────▶ Your App
│  (may contain    │     │  Engine          │     │  Policy Engine   │     (masked)
│   PII, SSNs,    │     │  (retrieval +   │     │  (detect + mask) │
│   credit cards) │     │   generation)   │     │                  │
└─────────────────┘     └─────────────────┘     └─────────────────┘

When sensitive data policies are configured on a dataset:

  1. Gaia retrieves relevant documents as usual.
  2. Before generating the answer, the policy engine scans the retrieved context for sensitive patterns.
  3. Detected sensitive data is masked or redacted according to the active policy.
  4. The generated answer uses the masked context, so sensitive details never appear in the response.

Sensitive Data Categories

Gaia can detect and handle several categories of sensitive information:

Category Examples Masking Behavior
PII Names, email addresses, phone numbers, physical addresses Replaced with [REDACTED-PII]
Financial Credit card numbers, bank accounts, routing numbers Replaced with [REDACTED-FINANCIAL]
Government IDs SSNs, passport numbers, driver's license numbers Replaced with [REDACTED-ID]
Health Medical record numbers, diagnoses, prescription info Replaced with [REDACTED-HEALTH]
Credentials API keys, passwords, tokens Replaced with [REDACTED-CREDENTIAL]

Policy granularity

The exact categories and masking behavior depend on the policies available in your Gaia deployment. Use the List Policies API to see what's available.


API: GET /sensitive-data/policies

List all available sensitive data handling policies.

Request:

Text Only
GET /v2/mcm/gaia/sensitive-data/policies

Headers:

Text Only
apiKey: your-api-key

Response:

JSON
{
  "policies": [
    {
      "id": "pii-standard",
      "name": "Standard PII Protection",
      "description": "Detects and masks common PII including names, emails, phone numbers, and addresses.",
      "categories": ["PII"],
      "enabled": true
    },
    {
      "id": "financial-strict",
      "name": "Financial Data Protection",
      "description": "Detects and redacts financial account numbers, credit card numbers, and related data.",
      "categories": ["FINANCIAL", "GOVERNMENT_ID"],
      "enabled": true
    },
    {
      "id": "hipaa-compliance",
      "name": "HIPAA Compliance",
      "description": "Detects and redacts protected health information (PHI) as defined by HIPAA.",
      "categories": ["HEALTH", "PII"],
      "enabled": false
    }
  ]
}

Response Fields

Field Type Description
id string Unique policy identifier
name string Human-readable policy name
description string What the policy protects
categories string[] Sensitive data categories covered
enabled boolean Whether the policy is currently active

Backend Implementation

Listing Policies

Using the SDK:

Python
from fastapi import APIRouter, Depends
from gaia_sdk import GaiaClient
from backend.api.dependencies import get_gaia_client

router = APIRouter()


@router.get("/sensitive-data/policies", tags=["Sensitive Data"])
async def list_policies(
    client: GaiaClient = Depends(get_gaia_client),
):
    """List available sensitive data handling policies."""
    policies = await client.list_sensitive_data_policies()
    return {"policies": policies}

Using raw httpx:

Python
import httpx


async def list_sensitive_data_policies(api_key: str) -> list[dict]:
    """Fetch all sensitive data policies from Gaia."""
    async with httpx.AsyncClient() as client:
        response = await client.get(
            "https://helios.cohesity.com/v2/mcm/gaia/sensitive-data/policies",
            headers={"apiKey": api_key},
        )
        response.raise_for_status()
        data = response.json()
        return data.get("policies", [])

Displaying Policy Status

Build a simple admin panel showing which policies are active:

Python
@router.get("/admin/sensitive-data", tags=["Admin"])
async def sensitive_data_dashboard(
    client: GaiaClient = Depends(get_gaia_client),
):
    """Return a summary of sensitive data policy status."""
    policies = await client.list_sensitive_data_policies()
    return {
        "total_policies": len(policies),
        "active_policies": [p for p in policies if p.get("enabled")],
        "inactive_policies": [p for p in policies if not p.get("enabled")],
    }

Configuring Policies on Datasets

Sensitive data policies are typically configured at the dataset level through the Cohesity management interface or API. When creating or updating a dataset, you can specify which policies apply:

Python
async def configure_dataset_policies(
    api_key: str,
    dataset_name: str,
    policy_ids: list[str],
) -> dict:
    """Apply sensitive data policies to a dataset."""
    async with httpx.AsyncClient() as client:
        response = await client.put(
            f"https://helios.cohesity.com/v2/mcm/gaia/dataset/{dataset_name}/sensitive-data",
            headers={
                "apiKey": api_key,
                "Content-Type": "application/json",
            },
            json={"policyIds": policy_ids},
        )
        response.raise_for_status()
        return response.json()

Administrative operation

Configuring sensitive data policies requires administrative privileges on the Cohesity cluster. Regular API keys may not have permission to modify dataset policies. Contact your Cohesity administrator to enable policies.


Frontend: Policy Management UI

A read-only view of active policies, useful for compliance dashboards:

TSX
// src/components/SensitiveDataPolicies.tsx

import { useEffect, useState } from "react";
import { api } from "../api/client";

interface Policy {
  id: string;
  name: string;
  description: string;
  categories: string[];
  enabled: boolean;
}

export function SensitiveDataPolicies() {
  const [policies, setPolicies] = useState<Policy[]>([]);
  const [loading, setLoading] = useState(true);

  useEffect(() => {
    api
      .get<{ policies: Policy[] }>("/sensitive-data/policies")
      .then((data) => setPolicies(data.policies))
      .finally(() => setLoading(false));
  }, []);

  if (loading) return <p>Loading policies...</p>;

  return (
    <div className="space-y-4">
      <h2 className="text-lg font-bold">Sensitive Data Policies</h2>
      <div className="grid gap-3">
        {policies.map((policy) => (
          <div
            key={policy.id}
            className={`rounded-lg border p-4 ${
              policy.enabled
                ? "border-green-200 bg-green-50"
                : "border-gray-200 bg-gray-50"
            }`}
          >
            <div className="flex items-center justify-between">
              <h3 className="font-semibold">{policy.name}</h3>
              <span
                className={`rounded-full px-2 py-0.5 text-xs font-medium ${
                  policy.enabled
                    ? "bg-green-100 text-green-700"
                    : "bg-gray-100 text-gray-500"
                }`}
              >
                {policy.enabled ? "Active" : "Inactive"}
              </span>
            </div>
            <p className="text-sm text-gray-600 mt-1">{policy.description}</p>
            <div className="flex gap-1 mt-2">
              {policy.categories.map((cat) => (
                <span
                  key={cat}
                  className="rounded bg-gray-200 px-2 py-0.5 text-xs"
                >
                  {cat}
                </span>
              ))}
            </div>
          </div>
        ))}
      </div>
    </div>
  );
}

Best Practices

Defense in depth

Sensitive data policies are one layer of protection. Also consider:

  • Access control — Use Gaia's security context to restrict which users can query which datasets.
  • Audit logging — Log all queries and the datasets accessed for compliance review.
  • Data minimization — Only index datasets that your application actually needs.
  • Network security — Use HTTPS for all Gaia API calls and enforce TLS in production.
  1. Enable policies before indexing — Policies work best when applied before data is indexed, so sensitive patterns are caught during ingestion.
  2. Test with known PII — Upload test documents with known sensitive patterns to verify that masking works as expected.
  3. Monitor policy coverage — Regularly review which policies are active and whether they cover all required data categories for your compliance requirements.
  4. Don't rely solely on masking — Masking reduces exposure but isn't foolproof. Context clues in surrounding text may still reveal sensitive information. Combine with access controls and audit logging.

Next Steps