Skip to main content
← Learning Center
Data Exposure

LLM Data Exposure: The Hidden Supply Chain

Your prompts travel through at least five layers before the LLM responds. At every layer, your data is exposed — copied, logged, analyzed, and sometimes sold. Click through the chain to see what leaks at each stop, and how Shield closes every gap by redacting on your machine before the first byte leaves.

Interactive Supply Chain Map

Click a node to see what's exposed at each layer

Your App
Network
LLM Provider
Model Training
Data Ecosystem
↑ Click any node above to see what data is exposed at that layer

The Magnitude of the Problem

3–7
Sensitive fields per average enterprise prompt
API keys, customer PII, internal URLs, proprietary code, and DB connection strings appear in a single chat message — often inadvertently pasted by engineers debugging.
30–90 days
Retention window before data is fully purged
Most providers retain prompt logs 30-90 days for abuse monitoring. During that window, your data is accessible to provider staff, sub-processors, and compliance audits.
11+
Known LLM data exposure incidents since 2023
From Samsung's source code leak to internal Slack tokens exposed via training data extraction — the pattern repeats because prompt-level security is still an afterthought.

How Shield Closes Every Gap

01

Runs on your machine, before TLS

Shield is a local desktop application — not a cloud service, not a network appliance. Redaction happens in your process space, on your CPU, before the first encrypted byte leaves your network interface. This is the only layer where you have cryptographic certainty about what leaves your machine.

02

Five-layer detection engine

Regex catches known patterns (AWS keys, JWT tokens). Entropy analysis catches novel secrets. ML classifiers catch context-dependent PII. Structural analysis catches credential-bearing JSON. Contextual scoring reduces false positives by weighting detections against surrounding text.

03

Tokenization, not blocking

Rather than blocking requests (which breaks developer workflows), Shield replaces sensitive tokens with opaque placeholders. The LLM still receives a grammatically coherent prompt — it just sees <REDACTED_TOKEN_42> instead of real credentials. Developers stay productive; data stays protected.

04

Provider-agnostic proxy

Shield works with any OpenAI-compatible API, plus Anthropic, Google, Azure, AWS Bedrock, and local models. One env var points your app to Shield; Shield forwards to your provider(s) of choice. No SDK changes, no library dependencies, no provider lock-in.

05

Tamper-evident audit chain

Every redaction is cryptographically hashed and chained into a local audit log. Compliance teams can verify that specific data types were redacted at specific times without ever seeing the original values. SOC 2, HIPAA, and GDPR auditors get verifiable evidence — not screenshots.

Stop your data at the source

Shield redacts sensitive content on your machine — before it hits the network, before the provider logs it, before anyone can train on it. Three tiers, one env var, zero workflow changes.

See PricingBook a Demo

Frequently Asked Questions

HTTPS protects data in transit — between your machine and the provider's server. It does NOT protect data at the provider. Once the TLS session terminates, the provider sees the full plaintext of your prompt. They log it, store it, and may train on it. Shield redacts sensitive content BEFORE it enters the HTTPS connection, so the provider never sees it — even in plaintext at their endpoint.
Opt-outs are contractual promises, not technical guarantees. OpenAI, Anthropic, and Google all offer opt-out settings — but these are API flags the provider's own code must respect. There is no cryptographic enforcement. In 2023, multiple providers were found training on 'API' data despite opt-out claims. Shield gives you technical enforcement: the data literally isn't there to train on, regardless of what the provider does with it.
Shield runs locally and adds sub-millisecond latency for redaction on typical prompt sizes (under 8K tokens). The network round-trip to the LLM provider dominates total latency — Shield's overhead is negligible. For streaming responses, Shield processes incrementally with zero buffering delay.
Shield replaces sensitive tokens with opaque placeholders that preserve character count, word boundaries, and syntactic position. The LLM still receives a grammatically coherent prompt — it just sees <REDACTED_42> instead of a real SSN. Most uses of AI (summarization, analysis, code review) don't require the actual secret values. For cases where exact data is needed, Shield's allow-listing lets you exempt specific fields.
Yes. Shield is provider-agnostic — it sits as a local HTTPS proxy that intercepts outbound requests to any endpoint. It works with OpenAI, Anthropic, Google, Azure, AWS Bedrock, local Ollama, and any OpenAI-compatible API. The redaction engine doesn't care which model you're calling — it processes the HTTP body before it leaves your machine.
DLP tools scan for known patterns (regex) at the network perimeter — they block or alert on matches. Shield uses five detection layers (regex, entropy, ML classifiers, structural analysis, and contextual scoring) and REDACTS in-place rather than blocking. Blocking breaks workflows; redaction preserves them. DLP also typically runs at the network layer — Shield runs on the developer's machine, catching data before it reaches the network at all. This is the critical API-call layer that DLP can't see because it's encrypted HTTPS by the time it hits the perimeter.