Module 4: AI-Driven Data Cataloging with MCP

Duration: 30 minutes (hands-on) / 10 minutes (conceptual only)
Type: Hands-on Lab or Presentation
Prerequisites: IBM Storage Fusion 2.12+, OpenShift Lightspeed

Learning Objectives

By the end of this module, you will:

  • Understand what IBM Fusion Data Cataloging Service (DCS) provides for unstructured data governance

  • Explain the Model Context Protocol (MCP) and how it enables AI agents to interact with enterprise data catalogs

  • Understand how Red Hat OpenShift Lightspeed integrates with the DCS MCP Server for on-cluster natural language catalog exploration

  • (Hands-on) Load sample data into an S3 bucket, register it in DCS, enable the MCP Server, and explore catalog metadata through Lightspeed

Conceptual Background

The Unstructured Data Challenge

Enterprises generate massive volumes of unstructured data across distributed storage platforms. Teams struggle with three persistent problems:

  • Data is difficult to discover and understand — datasets are spread across multiple platforms, poorly documented, and hard to explore

  • Metadata becomes outdated and inconsistent — as pipelines evolve and new sources are introduced, documentation falls out of sync

  • Data teams are overloaded with manual work — engineers spend too much time maintaining documentation and answering repetitive questions instead of innovating

Traditional approaches — manual tagging, static reports, hand-written SQL queries across schemas — simply do not scale.

IBM Fusion Data Cataloging Service (DCS)

IBM Fusion Data Cataloging Service (also known as FDC or DCS) is a component of IBM Storage Fusion 2.12+ that provides automated metadata discovery, classification, and governance for data stored across Fusion-managed storage systems.

DCS continuously indexes data sources, extracts metadata, applies policy-driven classification, and maintains a searchable catalog of all data assets. It runs as a set of pods in the ibm-data-cataloging namespace on your OpenShift cluster.

Model Context Protocol (MCP)

The Model Context Protocol is an open standard (JSON-RPC 2.0) that enables Large Language Models to securely interact with external tools, data, and services without hard-coded integrations.

How MCP works:

  1. Tool Discovery — AI agents query the MCP server at runtime to discover available capabilities

  2. Conversational Interface — users ask natural language questions; the agent translates them into MCP tool calls

  3. Universal Compatibility — MCP works with any compatible LLM client (OpenAI, IBM watsonx, Claude, Ollama, and others)

  4. Structured Communication — instead of building custom API connectors for each model, MCP provides a single protocol layer

DCS exposes an MCP Server that allows LLMs to read and write catalog metadata, discover datasets, apply tags, and run classification workflows — all through natural language.

OpenShift Lightspeed (OLS)

Red Hat OpenShift Lightspeed is an AI assistant integrated directly into the OpenShift web console. In this workshop, OLS is configured with OpenAI GPT-4o as the LLM backend, providing:

  • A chat panel accessible from any page in the OpenShift console

  • Natural language queries about cluster resources, troubleshooting, and — through MCP — data catalog exploration

  • On-cluster experience that requires no external LLM client configuration

Architecture

The end-to-end flow for AI-driven catalog exploration:

Workshop Attendee
    │
    ▼
OpenShift Console ──► Lightspeed Chat Panel
    │                        │
    │                        ▼
    │                  OpenAI API (GPT-4o)
    │                        │
    │                        ▼ (MCP Tool Calls - JSON-RPC 2.0)
    │                  DCS MCP Server
    │                  (ibm-data-cataloging namespace)
    │                        │
    │                        ▼
    │                  Data Catalog
    │                  (Metadata + Tags + Classifications)
    │                        │
    │                        ▼
    │                  Storage Sources
    │                  (Fusion, S3, NFS)

The key insight: MCP transforms storage from a passive data repository into an AI-native resource that can be explored, enriched, and governed through conversation.

Prerequisites Check

Before proceeding with the hands-on exercises, verify the following prerequisites are met on your cluster.

Check 1: Fusion Version

oc get csv -n ibm-spectrum-fusion-ns -o custom-columns='NAME:.metadata.name,DISPLAY:.spec.displayName,VERSION:.spec.version' | grep -i fusion

The output should show version 2.12 or higher. If your cluster runs an earlier version, skip to the Conceptual Walkthrough section.

Check 2: Data Cataloging Service

oc get namespace ibm-data-cataloging 2>/dev/null && echo "DCS namespace exists" || echo "DCS namespace NOT found"
oc get pods -n ibm-data-cataloging --no-headers 2>/dev/null | head -10

You should see pods running in the ibm-data-cataloging namespace. If the namespace does not exist, DCS may not be installed — skip to the Conceptual Walkthrough section.

Check 3: OpenShift Lightspeed

oc get csv -A | grep -i lightspeed

Verify the Lightspeed operator is installed and the chat panel is visible in the OpenShift console (look for the Lightspeed icon in the top navigation bar).

If any of these prerequisites are not met, proceed directly to the Conceptual Walkthrough section for a guided overview of the DCS MCP capabilities.

Hands-on: Verify Sample Data

Before enabling the MCP Server, you need data in the catalog for the AI to explore. The workshop deploys an automated dcs-setup Job that handles the full ingestion pipeline: creating an S3 bucket, generating sample data, registering the data source in DCS, and triggering a catalog scan. In this section you will verify that the Job completed and the catalog is populated.

Step 1: Verify the Setup Job

The dcs-setup Job runs automatically during workshop provisioning. Check its status:

oc get job dcs-setup -n ibm-data-cataloging

You should see COMPLETIONS 1/1. If the Job is still running, wait a minute and re-check.

To view the Job’s full output:

oc logs job/dcs-setup -n ibm-data-cataloging -c setup --tail=30

Step 2: Verify the Data Connection

Confirm the S3 connection was registered and is online:

DCS_TOKEN=$(oc exec -n ibm-data-cataloging deploy/isd-proxy -- \
  curl -s "http://auth.ibm-data-cataloging.svc:80/auth/v1/token" \
    -u "sdadmin:$(oc get secret keystone -n ibm-data-cataloging -o jsonpath='{.data.password}' | base64 -d)" \
    -D /dev/stderr 2>&1 | grep -i 'x-auth-token' | awk '{print $2}' | tr -d '\r\n')
oc exec -n ibm-data-cataloging deploy/isd-proxy -- \
  curl -s "http://connmgr.ibm-data-cataloging.svc:80/connmgr/v1/connections" \
    -H "Authorization: Bearer ${DCS_TOKEN}" \
  | python3 -c "
import sys,json
conns = json.load(sys.stdin)
for c in conns:
    print(f\"Connection: {c['name']}  |  Platform: {c['platform']}  |  Online: {c['online']}  |  Records: {c['total_records']}\")
"

You should see a connection named workshop-sample-data with Online: 1 and a non-zero record count.

Step 3: Verify the Catalog

Confirm files have been cataloged by querying the DB2 catalog database:

oc exec -n ibm-data-cataloging c-isd-db2u-0 -c db2u -- su - db2inst1 -c \
  'printf "connect to BLUDB;\nSELECT FILETYPE, COUNT(*) AS CNT FROM BLUADMIN.METAOCEAN GROUP BY FILETYPE;\n" | db2 -t' \
  2>/dev/null | grep -E "^\w"

You should see a breakdown by file type (csv, json, log, pdf, yaml) totaling approximately 25 files.

Step 4: Explore the Data in the DCS Console (Optional)

For a visual overview of the cataloged data, open the DCS web console:

  1. Navigate to: {dcs_console_url}

    The DCS admin username is sdadmin. Retrieve the password:

    oc get secret keystone -n ibm-data-cataloging -o jsonpath='{.data.password}' | base64 -d && echo
  2. Click the hamburger menu (☰) and navigate to Data connections > Connections

  3. You should see the workshop-sample-data connection with a green "Online" indicator:

    DCS Connections showing workshop-sample-data online
  4. Click Metadata > Data Catalog to browse the indexed files, their types, and sizes

The DCS Dashboard may show "Records indexed: 0" even though the data IS in the catalog:

DCS Dashboard showing zero records indexed

This is a cosmetic discrepancy — the dashboard counter tracks a separate indexing pipeline stage. The catalog records in the METAOCEAN table are fully populated and queryable via MCP, as you confirmed in Step 3.

Step 4b: Add Your Own Data (Optional)

Want to catalog your own files? You can upload additional data to the S3 bucket that the workshop created.

  1. Extract the S3 credentials:

    export AWS_ACCESS_KEY_ID=$(oc get secret dcs-sample-data -n {user_namespace} -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d)
    export AWS_SECRET_ACCESS_KEY=$(oc get secret dcs-sample-data -n {user_namespace} -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 -d)
    export BUCKET_NAME=$(oc get obc dcs-sample-data -n {user_namespace} -o jsonpath='{.spec.bucketName}')
    export S3_ENDPOINT={s3_endpoint}
  2. Install the AWS CLI (if not already available):

    pip install --user awscli 2>/dev/null && export PATH=$HOME/.local/bin:$PATH
  3. Upload your files:

    aws --endpoint-url ${S3_ENDPOINT} --no-verify-ssl \
      s3 cp /path/to/your/file.csv s3://${BUCKET_NAME}/custom/file.csv
  4. Trigger a rescan from the DCS console: navigate to Data connections > Connections, click the three-dot menu (⋮) next to workshop-sample-data, and select Scan. Alternatively, trigger via the API:

    DCS_TOKEN=$(oc exec -n ibm-data-cataloging deploy/isd-proxy -- \
      curl -s "http://auth.ibm-data-cataloging.svc:80/auth/v1/token" \
        -u "sdadmin:$(oc get secret keystone -n ibm-data-cataloging -o jsonpath='{.data.password}' | base64 -d)" \
        -D /dev/stderr 2>&1 | grep -i 'x-auth-token' | awk '{print $2}' | tr -d '\r\n')
    oc exec -n ibm-data-cataloging deploy/isd-proxy -- \
      curl -s -X POST "http://connmgr.ibm-data-cataloging.svc:80/connmgr/v1/scan/workshop-sample-data" \
        -H "Authorization: Bearer ${DCS_TOKEN}"

    After the scan completes (~1-2 minutes), your files will appear in the catalog and be queryable through Lightspeed.

Manual setup fallback: If the dcs-setup Job has not run (e.g., on a manually provisioned cluster), you can trigger it:

oc delete job dcs-setup -n ibm-data-cataloging --ignore-not-found
oc apply -f https://raw.githubusercontent.com/Red-Hat-SE-RTO/ibm-fusion-workshop/main/components/dcs-setup/templates/job.yaml

Alternatively, refer to the setup script at components/dcs-setup/scripts/dcs-setup.py for the full manual procedure.

Hands-on: Enable the DCS MCP Server

Step 5: Enable Natural Language Queries

The DCS MCP Server must be explicitly enabled. Patch the SpectrumDiscover custom resource:

DCS_NS=ibm-data-cataloging

oc -n ${DCS_NS} patch SpectrumDiscover \
  $(oc -n ${DCS_NS} get spectrumdiscover -o jsonpath='{.items[*].metadata.name}') \
  --type=merge -p '{
    "spec": {
      "enabled_features": {
        "natural-language-queries": {
          "enabled": true
        }
      }
    }
  }'

Step 6: Verify MCP Server Pods

Wait for the MCP server pods to start:

oc -n ibm-data-cataloging get pod -l 'app=isd,role=dcs-ai-mcp-server' --watch

Press Ctrl+C once the pods show Running status.

Step 7: Retrieve the MCP Endpoint

Get the route for the MCP HTTP endpoint:

MCP_URL="https://$(oc -n ibm-data-cataloging get route dcs-mcp-route -o jsonpath='{.spec.host}')/mcp/http"
echo "MCP endpoint: ${MCP_URL}"

This is the Streamable HTTP endpoint that Lightspeed connects to. The workshop has already configured OLSConfig with this URL — no manual configuration needed.

Hands-on: Explore the Catalog via OpenShift Lightspeed

Step 8: Open the Lightspeed Chat Panel

  1. Navigate to the OpenShift web console: {openshift_console_url}

  2. Look for the Lightspeed icon in the top navigation bar (or the side panel)

  3. Click to open the chat panel

Step 8b: Authenticate Lightspeed with DCS

The DCS MCP Server has its own authentication layer (IBM Keystone) that is separate from OpenShift RBAC. Even though Lightspeed already knows the MCP endpoint URL, each chat session must authenticate with DCS credentials before any catalog queries will work. This is a one-time step per conversation.

Run this command to build the ready-to-paste authentication message:

DCS_PW=$(oc get secret keystone -n ibm-data-cataloging -o jsonpath='{.data.password}' | base64 -d)
API_HOST=$(echo "{cluster_domain}" | sed 's/^apps\./api./')
API_URL="https://$API_HOST:6443"
echo ""
echo "╔══════════════════════════════════════════════════════════════╗"
echo "║  Copy the message below and paste it into Lightspeed chat:  ║"
echo "╚══════════════════════════════════════════════════════════════╝"
echo ""
echo "Use the dcs_set_credentials tool with these exact values:"
echo "  cluster_url: ${API_URL}"
echo "  username: sdadmin"
echo "  password: ${DCS_PW}"

Copy the entire output (all four lines starting with "Use the dcs_set_credentials tool…​") and paste it into the Lightspeed chat panel. The structured format ensures Lightspeed passes the parameters correctly to the MCP tool. You should see a confirmation: "Token generated successfully for sdadmin".

Do not reword or shorten the message. The LLM needs the exact parameter names (cluster_url, username, password) and values to call the tool correctly. If the tool fails with "x-auth-token not found", double-check that the cluster_url starts with https://api. and uses port 6443 (not the internal 172.30.0.1 address).

Why is this needed? The DCS MCP Server is a multi-tenant gateway — it does not inherit your OpenShift identity. Each session authenticates with DCS-specific credentials (the sdadmin user and Keystone password) via a JSON-RPC tool call (dcs_set_credentials). This keeps the MCP protocol portable across different LLM clients while maintaining DCS’s own access control.

If Lightspeed does not recognize the DCS tools, verify that the MCP feature is enabled by checking the chat panel’s tool indicator. The OLSConfig must include featureGates: ["MCPServer"] and the mcpServers entry pointing to the DCS MCP route.

Step 9: Query the Data Catalog

With authentication complete, try the following natural language queries in the Lightspeed chat panel.

How it works under the hood: When you ask a question, GPT-4o translates it into a SQL query against the DCS METAOCEAN table and calls the dcs_file_search MCP tool to execute it. For example, asking "What file types are in the catalog?" generates something like SELECT FILETYPE, COUNT(*) AS CNT FROM METAOCEAN GROUP BY FILETYPE. The MCP server runs the SQL and returns structured results, which Lightspeed presents as a natural language summary.

Metadata queries only — not file contents. DCS is a metadata catalog, not a full-text search engine. The MCP tools query the METAOCEAN table, which stores information about files (name, type, size, path, timestamps, owner, tags) but does not read or index the actual contents of those files.

You CAN ask: "What are the largest PDF files?" "Show me files modified this week." "Which files are tagged as sensitive?"

You CANNOT ask: "Summarize the OpenShift architecture PDF." "What IP addresses appear in the log files?" "Show me the customer names in customer-directory.csv."

Think of it like a library card catalog: you can search by title, author, and subject — but you can’t read the book through the catalog.

Discovery queries:

  • "How many total files are in the data catalog?"

  • "What file types are present in the catalog and how many of each?"

  • "Show me all PDF documents with their filenames and sizes"

Lightspeed showing 25 files in the catalog

Metadata exploration:

  • "What are the 10 largest files in the catalog by size?"

  • "Show me all files under the logs/ directory path"

  • "Which files were most recently added to the catalog? Sort by insert time."

  • "List all CSV files with their filenames, sizes, and paths"

Lightspeed showing the 10 largest files sorted by size

Tags and governance:

  • "What tags are currently registered in the system?"

  • "What policies exist in the catalog?"

Step 10: AI-Driven Tagging and Classification

MCP enables the LLM to not just read the catalog, but also enrich it. The available tools include dcs_create_tag, dcs_create_policy, dcs_get_recommend_tags, and dcs_get_registered_tags. Try the following:

  1. First, ask Lightspeed to analyze your data and recommend tags:

    • "Analyze the file types and paths in the catalog and recommend meaningful tags for classifying them"

      Lightspeed will call dcs_get_recommend_tags with a SQL query to explore the data, then propose tags based on what it finds:

      Lightspeed recommending tags based on file types
  2. Create a tag (DCS stores tag names in UPPERCASE automatically):

    • "Create a tag called DOC_CATEGORY with type VARCHAR(256)"

  3. Create an auto-tagging policy. Try asking Lightspeed:

    • "Create an AUTOTAG policy called classify-pdfs that filters on filetype='pdf', runs NOW, and sets DOC_CATEGORY to reference-guide"

      If the policy creation fails with "'tags' parameter is required and must be a map" — this is a known limitation. The dcs_create_policy tool requires tags as a JSON object, and some LLMs struggle to construct nested objects in tool calls. Use the terminal fallback below to create the policy directly via the MCP server:

      MCP_HOST=$(oc -n ibm-data-cataloging get route dcs-mcp-route -o jsonpath='{.spec.host}')
      DCS_PW=$(oc get secret keystone -n ibm-data-cataloging -o jsonpath='{.data.password}' | base64 -d)
      API_HOST=$(echo "{cluster_domain}" | sed 's/^apps\./api./')
      
      SID=$(curl -sk "https://$MCP_HOST/mcp/http" \
        -H "Content-Type: application/json" -H "Accept: application/json, text/event-stream" \
        -D /dev/stderr -o /dev/null \
        -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"cli","version":"1.0"}}}' \
        2>&1 | grep -i mcp-session-id | awk '{print $2}' | tr -d '\r\n')
      
      curl -sk "https://$MCP_HOST/mcp/http" \
        -H "Content-Type: application/json" -H "Accept: application/json, text/event-stream" \
        -H "Mcp-Session-Id: $SID" \
        -d "{\"jsonrpc\":\"2.0\",\"id\":2,\"method\":\"tools/call\",\"params\":{\"name\":\"dcs_set_credentials\",\"arguments\":{\"cluster_url\":\"https://$API_HOST:6443\",\"username\":\"sdadmin\",\"password\":\"$DCS_PW\"}}}" > /dev/null 2>&1
      
      echo "Creating policy classify-pdfs..."
      curl -sk "https://$MCP_HOST/mcp/http" \
        -H "Content-Type: application/json" -H "Accept: application/json, text/event-stream" \
        -H "Mcp-Session-Id: $SID" \
        -d '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"dcs_create_policy","arguments":{"policyName":"classify-pdfs","polFilter":"filetype='"'"'pdf'"'"'","schedule":"NOW","tags":{"DOC_CATEGORY":"reference-guide"}}}}' 2>/dev/null | grep -o '"text":"[^"]*"' | cut -d'"' -f4

      This calls the same MCP tool (dcs_create_policy) with the same JSON-RPC protocol that Lightspeed uses — the only difference is that the terminal constructs the tags object directly instead of relying on the LLM.

  4. Verify what was created — ask Lightspeed (these read-only queries work reliably):

    • "Show me all registered tags and their types"

    • "List all policies in the system and their execution status"

      You should see the classify-pdfs policy with status complete and 3 PDFs tagged. You can also check in the DCS console under Tag management:

      DCS Tag Management showing created tags

This demonstrates a real-world pattern: LLMs excel at reading data through MCP tools (queries, listing tags, summarizing metadata) but can be unreliable when writing (creating policies with complex nested parameters). Robust implementations use MCP for discovery and conversational analysis, with programmatic fallbacks for mutations.

Step 11: Explore the Catalog with Natural Language

Now use Lightspeed to explore the catalog using only natural language — no SQL required. The LLM translates each question into SQL behind the scenes via MCP tool calls.

  1. Verify the policy you created is active and has completed:

    • "List all policies in the system and their execution status"

      (Lightspeed calls dcs_get_policies) — You should see classify-pdfs with action AUTOTAG, status complete, and 3 files processed:

      Lightspeed showing policy execution status via dcs_get_policies
  2. Explore the catalog by file type:

    • "What file types are present in the catalog and how many of each?"

      (Lightspeed calls dcs_file_search) — You should see yaml (13), csv (3), json (3), log (3), pdf (3) totaling 25 files:

      Lightspeed showing file type breakdown via dcs_file_search
  3. Identify files that may need attention based on naming patterns:

    • "Which files have names containing 'customer' or 'audit'? Show the filename, filetype, and size."

      (Lightspeed calls dcs_file_search) — Lightspeed will find customer-directory.csv (4,580 bytes) and api-audit-2026-03-31.log (13,299 bytes) — files that could contain PII or sensitive operations data.

  4. Analyze storage consumption:

    • "What is the total storage consumed by each file type?"

      (Lightspeed calls dcs_file_search) — You will see PDFs dominate at ~6.9 MB (documentation), while YAML manifests consume only ~5 KB despite having the most files:

      Lightspeed showing storage consumption by file type
  5. Check the most recent additions:

    • "What are the 5 most recently added files in the catalog?"

      (Lightspeed calls dcs_file_search) — Shows the latest files by insertion timestamp.

  6. Review the tag landscape:

    • "What tags are currently registered in the system?"

      (Lightspeed calls dcs_get_registered_tags) — Lists all tags with their data types: COLLECTION, TEMPERATURE, DOC_CATEGORY, and any others you created.

Going further: Security Risk Assessment. The IBM Community article demonstrates a full end-to-end security workflow where an LLM creates classification tags (e.g., SENSITIVITY), applies auto-tagging policies to flag files with PII, and generates a comprehensive security posture dashboard — all through MCP conversation. The pattern is the same as what you did with classify-pdfs, extended to security classification at scale.

Conceptual Walkthrough

This section covers the same concepts as the hands-on exercises above, presented as a guided walkthrough for environments where DCS or OLS is not yet available.

How DCS MCP Transforms Data Governance

Traditional data catalog operations require specialized knowledge — crafting SQL queries across schemas, manually tagging datasets, and producing static compliance reports. This process is:

  • Time-consuming — security assessments can take days across large catalogs

  • Error-prone — manual classification is inconsistent and hard to maintain

  • Non-scalable — effort grows linearly with data volume

With the DCS MCP Server, an AI agent can perform these same operations in minutes:

Traditional Approach MCP-Enabled Approach

Manual SQL queries to discover datasets

"What data sources are in the catalog?"

Hand-written tagging rules

"Recommend and apply tags based on content analysis"

Periodic manual security audits

"Perform a security risk assessment and generate a summary"

Static PDF reports

Interactive, conversational dashboards generated on demand

The MCP Request Flow

When a user asks a natural language question through Lightspeed:

  1. Lightspeed sends the query to the OpenAI GPT-4o model

  2. The model determines which MCP tools are needed (e.g., dcs_file_search, dcs_get_registered_tags, dcs_create_tag)

  3. Lightspeed sends JSON-RPC 2.0 tool invocation requests to the DCS MCP Server

  4. The MCP Server executes the operations against the Data Catalog

  5. Results flow back through the chain to the user as a natural language response

The model’s native tool-use capabilities make it particularly effective for multi-step catalog exploration tasks.

Use Case: Security Risk Assessment

The IBM Community article by Paul Llamas Virgen demonstrates a compelling real-world scenario:

  1. Challenge: Security teams lack a consolidated view of data risks across ingested datasets. Sensitive data may be poorly classified or not identified at all.

  2. MCP-Enabled Solution:

    • The LLM scans all catalog metadata through MCP tools

    • It identifies datasets that may contain sensitive information (PII, financial data, credentials)

    • It recommends classification tags based on observed patterns

    • Tags are applied programmatically across all matching assets

    • A security posture dashboard is generated summarizing the findings

  3. Result: What previously required days of manual SQL queries and spreadsheet analysis is accomplished through a conversational interaction in minutes.

Key Takeaways

  • MCP transforms storage into an AI-native resource — data catalogs become conversational, not transactional

  • IBM DCS speaks the language of AI agents — the MCP Server exposes catalog capabilities as tools that any compatible LLM can invoke

  • Enterprise tagging becomes intelligent — automated classification at scale replaces manual, inconsistent tagging

  • OpenShift Lightspeed keeps it on-cluster — no external LLM clients needed; workshop attendees use the same console they already know

Connection to Workshop Themes

This module bridges the operational storage foundations you built in earlier modules with the AI-native future explored in Module 5:

Earlier Module Connection to Data Cataloging

Module 1: Storage Architecture

DCS catalogs and classifies data across the same Ceph-based storage classes you provisioned

Module 3: Data Protection

Backup policies can be informed by catalog metadata — protect sensitive data first

Module 5: The 4.20+ Horizon

Content-Aware Storage (CAS) builds on DCS metadata for intelligent data tiering in AI pipelines

References

Facilitator Notes: This module has two paths. If Fusion 2.12+ and OLS are deployed, walk through the hands-on exercises — the security risk assessment demo is the most impactful. If prerequisites are not met, use the conceptual walkthrough and emphasize how MCP changes the data governance paradigm. Either way, tie the DCS capabilities back to Module 1 (storage architecture) and forward to Module 5 (AI-native operations with CAS and GIE).