Scenario 04: Cross-domain automated sanitisation

Overview

Military organisations operate multiple classification domains (e.g., TOP SECRET, SECRET, UNCLASSIFIED networks) that are physically or logically separated. Information frequently needs to move from higher classification to lower classification domains, but current processes rely on manual review and redaction. This scenario explores automated sanitisation that can intelligently remove sensitive content while preserving operational value, enabling faster and more consistent cross-domain information flow.

Problem statement

Moving classified information to lower classification levels currently requires extensive manual review by trained personnel. This process is slow (hours to days), inconsistent (different reviewers make different decisions), and doesn't scale to modern data volumes. Automated sanitisation could accelerate information sharing whilst maintaining security, but requires new standards for content analysis, redaction rules, and derivative work tracking.

Actors

High-Side Domain

Classification: TOP SECRET / Sensitive Compartmented Information (SCI)
Content: Intelligence reports, operational plans, sensor data with sensitive sources/methods
Users: Intelligence analysts, operational planners with TS/SCI clearances

Low-Side Domain

Classification: SECRET or UNCLASSIFIED
Content: Sanitised versions of high-side information
Users: Broader audience including coalition partners, lower-cleared personnel

Cross-Domain Solution (CDS)

Role: Gateway between classification domains
Current Capability: Manual review, dirty word blocking, format checking
Desired Capability: Automated content analysis, intelligent sanitisation, derivative work tracking

Sanitisation Engine (New Capability)

Role: Analyse content, identify sensitive elements, apply redaction rules
Capabilities: Natural language processing, entity recognition, classification reasoning
Output: Sanitised document + audit trail + relationship tracking

Scenario flow

Phase 1: Content Analysis

Context: Intelligence analyst on high-side creates report containing mix of sensitivity levels.

Content Example:

OPERATION WALL UPDATE - 15 JAN 2026

Enemy forces observed moving through GRID 12345678 (REDACTED LOCATION).
Movement detected by SIGINT platform BLACKBIRD (SENSITIVE SOURCE).
Estimated 200 personnel with armoured vehicles.
Movement pattern suggests preparation for offensive operations.
Recommend increased surveillance of GRID 12345678.

Sanitisation Engine Actions: - Identify sensitive elements: - "GRID 12345678" → Specific location (TS) - "SIGINT platform BLACKBIRD" → Source/method (TS/SCI) - "200 personnel with armoured vehicles" → Tactical detail (SECRET) - "Movement pattern suggests..." → Analysis (SECRET) - Determine what can be released at SECRET level - Apply redaction rules

Phase 2: Automated Sanitisation

Sanitisation Rules Applied: 1. Remove specific locations: Replace with general area description 2. Remove source/method details: Replace with generic "intelligence sources" 3. Preserve tactical value: Keep force estimates and analysis 4. Maintain coherence: Ensure sanitised version is readable and useful

Sanitised Output (SECRET level):

OPERATION WALL UPDATE - 15 JAN 2026

Enemy forces observed moving through NORTHERN SECTOR.
Movement detected by intelligence sources.
Estimated 200 personnel with armoured vehicles.
Movement pattern suggests preparation for offensive operations.
Recommend increased surveillance of NORTHERN SECTOR.

Phase 3: Human Review

Context: Automated sanitisation complete, but requires human verification.

Review Process: - Reviewer sees original (TS) and sanitised (SECRET) versions side-by-side - Highlights show what was redacted and why - Reviewer can: - Approve (sanitisation correct) - Modify (adjust redactions) - Reject (too much/too little removed) - Feedback improves sanitisation rules over time

Phase 4: Cross-Domain Transfer

Context: Sanitised document approved for release to SECRET domain.

Transfer Process: - Original TS document remains on high-side - Sanitised SECRET document transferred through CDS - Relationship tracked: "SECRET doc X derived from TS doc Y" - Audit trail records: who sanitised, who reviewed, what was removed - Metadata labels SECRET document with derivation information

Phase 5: Derivative Work Tracking

Context: SECRET document now on low-side, but relationship to high-side original must be maintained.

Tracking Requirements: - If high-side original is updated, flag low-side derivative for review - If high-side original is reclassified, update low-side derivative - If high-side original is deleted, consider impact on low-side derivative - Audit trail links all versions across domains

Operational constraints

Security: Sanitisation errors that leak classified information are unacceptable
Accuracy: Sanitised documents must preserve operational value
Speed: Sanitisation must be faster than manual review (minutes vs hours)
Consistency: Same content should be sanitised the same way every time
Auditability: All sanitisation decisions must be logged and explainable
Human Oversight: Automated sanitisation requires human review before release
Reversibility: Cannot reconstruct high-side content from low-side sanitised version
Domain Separation: Sanitisation engine must not bridge classification domains

Technical challenges

Content Understanding: How to identify sensitive elements in unstructured text?
Context Awareness: How to determine if information is sensitive based on context?
Redaction Granularity: Document, section, paragraph, sentence, or word level?
Coherence Preservation: How to maintain readability after redaction?
Classification Reasoning: How to determine appropriate classification of sanitised output?
Relationship Tracking: How to link high-side originals to low-side derivatives?
Version Management: How to handle updates to high-side documents?
Rule Management: How to define, update, and audit sanitisation rules?
Multi-Format Support: How to sanitise text, images, structured data, multimedia?
Error Handling: What happens when sanitisation engine makes mistakes?

Acceptance criteria

AC1: Automated Content Analysis

System identifies sensitive elements in unstructured text
Recognises entities (locations, people, organisations, capabilities)
Understands classification markers and caveats
Identifies source/method information
Determines context-dependent sensitivity
Analysis completes quickly enough for operational use

AC2: Intelligent Redaction

Removes sensitive elements whilst preserving operational value
Maintains document coherence and readability
Applies consistent redaction rules
Supports multiple redaction strategies (remove, replace, generalise)
Handles nested classifications (classified section in unclassified document)
Redaction is irreversible (cannot reconstruct original from sanitised version)

AC3: Classification Determination

Automatically determines appropriate classification of sanitised output
Considers highest remaining classification in document
Applies appropriate caveats and handling restrictions
Labels derivative work with correct classification markings
Explains classification reasoning

AC4: Human Review Workflow

Presents original and sanitised versions side-by-side
Highlights redacted elements with explanations
Allows reviewer to approve, modify, or reject
Tracks reviewer decisions and feedback
Improves sanitisation rules based on feedback
Maintains audit trail of review process

AC5: Cross-Domain Transfer

Integrates with existing Cross-Domain Solutions
Transfers sanitised documents securely
Prevents high-side content from leaking to low-side
Validates sanitisation before transfer
Logs all cross-domain transfers

AC6: Derivative Work Tracking

Links low-side derivatives to high-side originals
Tracks version history across domains
Flags derivatives when originals are updated
Maintains relationship metadata
Supports queries: "What low-side docs came from this high-side doc?"
Supports reverse queries: "What high-side doc did this come from?"

AC7: Audit Trail

Logs all sanitisation operations
Records what was redacted and why
Tracks who reviewed and approved
Logs cross-domain transfers
Maintains tamper-proof audit logs
Supports compliance and security investigations

AC8: Rule Management

Administrators can define sanitisation rules
Rules specify what to redact and how
Rules can be organisation-specific or domain-specific
Rule updates take effect appropriately
Rule conflicts detected and resolved
Rules are versioned and auditable

AC9: Multi-Format Support

Sanitises text documents (Word, PDF, plain text)
Sanitises structured data (XML, JSON, databases)
Sanitises images (redact sensitive portions)
Sanitises multimedia (video, audio)
Preserves format and usability after sanitisation

AC10: Error Handling and Safety

Bias towards over-redaction (fail secure)
Flags uncertain sanitisation decisions for human review
Prevents release if sanitisation confidence is low
Provides confidence scores for sanitisation decisions
Supports manual override with justification

AC11: Performance and Scalability

Sanitises typical documents quickly
Handles large documents efficiently
Scales to organisational document volumes
Minimal impact on cross-domain transfer throughput

AC12: Standards Compliance

Follows classification marking standards
Integrates with existing security policies
Supports multiple classification systems (national, NATO, coalition)
Complies with records management requirements
Meets cross-domain solution certification requirements

Success metrics

Sanitisation Speed: Significantly faster than manual review
Accuracy: Very high correct redaction rate (suitable for operational security)
False Positive Rate: Low over-redaction (preserves operational value)
False Negative Rate: Very low under-redaction (critical security risk)
Human Review Time: Reduced compared to full manual review
Consistency: Same content sanitised the same way across reviewers
Throughput: Increased cross-domain information flow
User Satisfaction: Reviewers and consumers find sanitised content useful

Example use cases

Use case 1: Intelligence report sanitisation

High-Side (TS/SCI): Detailed intelligence report with sources, methods, specific locations, intercepts.

Sanitisation: Remove sources/methods, generalise locations, preserve threat analysis.

Low-Side (SECRET): Threat assessment useful for operational planning without compromising sources.

Use case 2: Operational plan downgrade

High-Side (TS): Detailed operational plan with unit locations, timings, capabilities.

Sanitisation: Remove specific timings and locations, preserve general concept of operations.

Low-Side (SECRET): Concept of operations for coordination with coalition partners.

Use case 3: Sensor data release

High-Side (TS): Raw sensor data revealing collection capabilities and coverage.

Sanitisation: Aggregate data, remove capability indicators, preserve tactical picture.

Low-Side (SECRET): Tactical situation awareness without revealing sensor capabilities.

Out of scope

Real-time streaming data sanitisation
Sanitisation of data in motion (network traffic)
Encryption/decryption (separate concern, handled by TDF/ZTDF)
Cross-domain solution hardware/infrastructure
AI/ML model training (assume pre-trained models available)
Foreign language translation

Scenario 03: Legacy system retrofit - provides content labelling foundation
Scenario 01: Coalition sharing - sanitised content shared with allies
Scenario 06: Mission-based sharing - sanitised content for mission partners

Key assumptions

Content is Analysable: Documents are in formats that can be parsed
Rules are Definable: Organisations can articulate sanitisation rules
Human Review Available: Automated sanitisation requires human verification
CDS Integration Possible: Can integrate with existing cross-domain solutions
Accuracy Sufficient: Very high accuracy meets security requirements
Performance Acceptable: Faster than manual review is acceptable

New standards required

1. Sanitisation Markup Language (SML)

Purpose: Standard format for marking sensitive elements and redaction rules

Capabilities: - Tag sensitive entities (locations, people, capabilities) - Specify redaction actions (remove, replace, generalise) - Define classification reasoning - Support multiple classification systems

2. Derivative Work Metadata Standard

Purpose: Track relationships between high-side originals and low-side derivatives

Capabilities: - Link documents across classification domains - Track version history and updates - Record sanitisation provenance - Support bidirectional queries

3. Sanitisation Audit Format

Purpose: Standard format for logging sanitisation operations

Capabilities: - Record what was redacted and why - Track human review decisions - Log cross-domain transfers - Support compliance investigations

4. Classification Reasoning Schema

Purpose: Explain why content has specific classification

Capabilities: - Document classification logic - Support automated classification determination - Enable human review and override - Integrate with existing classification guides

Risk considerations

Security Risks: - Under-sanitisation leaks classified information to lower domain - Sanitisation engine compromised to deliberately leak information - Relationship tracking reveals high-side document existence to low-side users - Audit logs compromised to hide sanitisation errors

Operational Risks: - Over-sanitisation removes too much, making documents useless - Inconsistent sanitisation confuses users - Slow sanitisation creates bottleneck - Human review becomes rubber-stamp process

Mitigation Strategies: - Bias towards over-redaction (fail secure) - Extensive testing before operational deployment - Mandatory human review for all sanitisation - Continuous monitoring and improvement - Regular security audits of sanitisation engine

Automated sanitisation for cross-domain transfers. Requires new standards beyond current TDF/ZTDF capabilities.