Sovereign, Air-Gapped, and Attested AI Inference: A Compliance-First Architecture for Regulated Workloads

Christian Kearney

doi:10.5281/zenodo.20422087

Abstract

Regulated organizations want the utility of modern language models without surrendering control of privileged, protected, export-controlled, or commercially sensitive data. This manuscript presents a security framework for sovereign execution that treats compliance as a runtime property rather than a policy statement applied after the fact.

The design combines jurisdiction-aware deployment, a four-tier gateway and inference structure, six independent containment layers, volatile-memory-only session handling, metadata-only audit trails, cryptographic integrity checks, and intent-aware routing across heterogeneous inference engines. The framework is deliberately conservative: the model process is bound to localhost, blocked from outbound network egress at the kernel and orchestration layers, verified against dependency and model manifests, and supervised by an autonomous watchdog that can wipe active session buffers when compromise indicators appear.

The paper gives a threat model, a processing pipeline, a control-evidence crosswalk, and a validation plan suitable for technical review, compliance assessment, and further empirical benchmarking.

Contributions

Air-gapped execution pattern: a research framework and reference architecture that enforces loopback-only model binding, kernel-level egress denial, and offline operation suitable for classified and regulated workloads.
Evidence-first audit model: a metadata-only, Merkle-style evidence chain that enables third-party verification without exposing user content.
Stateless, attested runtime: volatile-memory session handling with explicit wipe semantics and an autonomous watchdog for emergency buffer destruction.
Operational validation: a reproducible test suite demonstrating network binding, egress denial, dependency and model-weight integrity checks, and tenant-scoped retrieval isolation.
Deployment guidance: concrete deployment profiles and assessor-oriented readiness artifacts for enterprise and classified environments.

Air-Gapped Security vs Cloud Security

The central comparison in this manuscript is not abstract model quality. It is deployment security. Air-gapped security minimizes exposed network paths, telemetry dependence, retention ambiguity, and subprocessor dependence by containing inference inside a controlled execution boundary. Cloud security, even when improved with guardrails or confidential compute, still depends on externally managed infrastructure, observability layers, and operational trust outside the model runtime.

This framing matters for healthcare, finance, defense, legal, and other regulated environments where organizations must prove where data went, what controls applied, and what evidence exists after execution. The manuscript argues that the difference between air-gapped and cloud-hosted inference is therefore theoretical, architectural, evidentiary, and compliance-relevant, not merely operational.

Related Work

Cloud confidential compute. Microsoft Azure Confidential Compute, AWS Nitro Enclaves, and Google Confidential VMs provide hardware-backed isolation for virtual machines and containers. These offerings protect workloads from cloud-provider administrators and co-tenants, but they do not address the full compliance surface for inference. They do not prevent model egress, do not enforce stateless operation, do not provide model-weight integrity verification, and do not supply audit-ready metadata without exposing user content. Confidential compute protects the hypervisor boundary, not the full inference lifecycle.

Cloud AI guardrails. OpenAI policy enforcement, Azure AI Content Safety, Google Vertex guardrails, and Amazon Bedrock guardrails provide prompt-injection filtering, content moderation, and safety classification around hosted model endpoints. They do not provide sovereign execution, do not eliminate retention, and do not allow independent verification of model provenance, dependency integrity, or runtime egress. For regulated workloads, guardrails alone are insufficient because they do not control the model runtime itself.

Enterprise AI governance platforms. Governance suites help organizations track datasets, model versions, and risk assessments, but they generally assume cloud-hosted models, shared infrastructure, and persistent logs. They do not provide a sovereign inference boundary, do not enforce stateless inference, and do not implement kernel-level egress denial or cryptographic verification of model weights at runtime.

Open-source serving frameworks. Hugging Face TGI, vLLM, SGLang, llama.cpp, and similar stacks provide high-performance inference but assume a trusted environment. They do not include egress controls, dependency manifests, model-weight verification, session-wipe semantics, or audit-ready metadata by default. They are optimized for throughput, not compliance.

Comparison summary. Across cloud AI, confidential compute, guardrails, governance platforms, and open-source serving stacks, the common gap is that none treat compliance as a runtime property. The framework advanced here differs by treating the model as an untrusted component inside a controlled execution boundary: loopback-only binding, kernel-level egress denial, dependency and model-weight verification, RAM-only session handling, autonomous watchdog response, and metadata-only audit.

Figures 1–5

Diagram of a four-tier architecture: public ingress to gateway, policy gateway, multi-engine inference pool, and a cross-cutting audit and security substrate. Air-gap boundary and loopback binding are highlighted. — Figure 1 — System Architecture Overview showing air-gap boundary, gateway, inference tiers, and audit substrate.

This figure locates the air-gap and sovereignty controls in the overall system. Public traffic terminates at Tier 1 (NGINX) which enforces TLS, CORS, and rate limits. Tier 2 (FastAPI) owns policy: authentication, prompt screening, retrieval, and routing. Tier 3 contains heterogeneous inference engines including llama.cpp for low-latency chat, SGLang for agentic workflows, FlexGen for batch, and a CPU pool for specialized tasks. Tier 4 spans the stack as the audit and security substrate and produces metadata-only evidence.

Stacked diagram listing six security layers from loopback binding at the base to intrusion watchdog at the top, each annotated with control objective and evidence artifact. — Figure 2 — Six containment layers that enforce sovereign, stateless, and attested inference.

Each layer is independently testable and produces evidence: loopback binding can be verified with socket inspection, egress denial with firewall tests, dependency and model verification with manifests and startup logs, session wipe with runtime tests and swap checks, and watchdog response with triggered wipe events. The layered design is intentionally redundant so that failure of one mechanism does not eliminate the rest of the boundary.

Linear pipeline diagram showing ten ordered stages: guardrails, authentication, memory injection, intent classification, engine routing, cache lookup, retrieval, context building, inference, and response post-processing with metadata emission. — Figure 3 — Ordered request pipeline from ingress to post-processing showing guardrails, retrieval, engine selection, and metadata emission.

The pipeline enforces ordering so that policy checks occur before any model execution and audit metadata is produced after response assembly. Guardrails block known jailbreaks, authentication enforces tenant policy, memory and retrieval remain tenant-scoped, engine routing selects the correct backend, and post-processing scrubs PII while emitting metadata-only audit events.

Flow diagram showing session allocation in RAM, use during inference, three-phase wipe including overwrite, replace keys, and clear data structures, and release. A watchdog can trigger emergency wipe at any point. — Figure 4 — Three-phase stateless lifecycle showing allocation, overwrite, and release with watchdog triggers.

The lifecycle minimizes durable artifacts by keeping session content in volatile memory and applying a three-phase wipe. The key test points are canary prompt absence in durable storage, swap checks, crash dump inspection, and watchdog-triggered wipe events. These are the points where an assessor can distinguish a policy statement from an evidence-backed control.

Matrix diagram mapping intent classes to engine backends and showing an offline transfer path for retrieval corpora and adapters into an air-gapped inference cluster with integrity verification and logging. — Figure 5 — Routing matrix for intent classes and engine selection plus air-gap transfer paths for retrieval corpora and adapters.

Routing matches workload intent to engine class to optimize cost and latency while preserving governance. The figure also makes the offline transfer path explicit: staging media, cryptographic verification of artifacts, serialized adapter loading, and audit logging. This is the operational bridge that lets a sovereign environment ingest models, adapters, and corpora without cloud control-plane dependencies.

Key Themes

Sovereign Execution

The manuscript frames AI inference as a controlled execution problem. Sensitive prompts should exist in as few places as possible, and each place should have a named control, an evidence artifact, and a failure mode that can be tested.

Defense in Depth

The runtime is treated as an untrusted but powerful component inside a stronger system boundary. Gateway policy, routing, retrieval, and audit live outside the model runtime so they can be verified independently.

Assessor-Oriented Evidence

Instead of relying on assurances alone, the design emphasizes manifests, logs, binding tests, egress tests, and metadata-only audit trails suitable for technical and compliance review.

Evaluation Results

This evaluation summarizes validation across security controls, stateless data handling, offline transfer workflows, and multi-engine inference performance. Results are expressed in a form suitable for technical review, compliance assessment, and reproducibility.

Security Validation

Network binding and egress denial: external scans and outbound connection attempts were blocked; inference processes bind only to loopback and cannot initiate outbound TLS.
Origin isolation: the architecture was validated to operate without cloud control planes; no telemetry or vendor logs are required for normal operation.
Dependency and model integrity: tampering tests cause startup refusal; manifests and model-weight hashes are enforced at load time.
Stateless wipe and crash handling: canary prompts and forced crashes produced no durable artifacts; watchdog wipe events are logged as metadata only.

Operational Validation

Offline ingestion and batch processing: retrieval corpora and adapters can be staged via removable media or air-gapped transfer; adapter loading is serialized and logged.
Auditor verification: Merkle-style roots over manifests, configs, and metadata allow auditors to verify integrity without accessing content.
RAG isolation: cross-tenant retrieval attempts are blocked and adapter pathways remain tenant-scoped.

Performance Summary

Air-gapped deployments show comparable latency for local GPU inference, and throughput scales with local cluster size. The primary tradeoff is between operational complexity and the legal and evidentiary advantages of bounded execution.

Residual Risks

Residual concerns remain at the operating system and hosting boundary: swap, crash dumps, hypervisor snapshots, and managed-language memory semantics. The framework addresses these through explicit operational controls rather than assuming they disappear by policy.

Air-Gapped vs Cloud Comparison

Attribute	Air-Gapped Sovereign Execution	Cloud AI Platforms
Sovereignty	Full; host and data remain under operator control	Partial; infrastructure remains externally controlled
Egress control	Kernel-level denial; no outbound allowed	Dependent on provider controls
Attestation and provenance	Cryptographic manifests and model-weight verification	Limited; provider logs and opaque supply chain
Retention	Volatile-only sessions with explicit wipe	Provider logs, backups, telemetry
Auditability	Metadata-only evidence chain and Merkle roots	Logs may contain content; cross-region replication
Scalability	Scales with local cluster; operational overhead	Elastic, managed scaling
Vendor lock-in	Lower; artifacts remain operator-controlled	Higher; provider APIs and telemetry
Operational complexity	Higher; hardware, provisioning, and ops	Lower; managed services

Recommended Use

This page is the canonical web summary of the work. If you are evaluating deployment, audit posture, or sovereign inference patterns, begin with this HTML publication, use the manuscript PDF for print-style review, and refer to the Zenodo record for archive-grade distribution.

Sovereign, Air-Gapped, and Attested AI Inference

Downloads