Question 1

What is Shannon entropy, and why does it matter for secret detection?

Accepted Answer

Shannon entropy measures the randomness of a string — the higher the entropy, the less predictable the content. API keys, tokens, and encryption keys are generated with cryptographic randomness, giving them entropy scores above 5.0. Normal human-readable text (like emails or code comments) typically scores below 3.5. Entropy analysis flags high-entropy strings for review regardless of whether they match a known key format — this catches custom tokens, internal API keys, and novel secret formats that regex patterns miss entirely.

Question 2

Can't regex just cover every secret format? Why add entropy?

Accepted Answer

Regex only catches known patterns. The problem is that every SaaS platform, internal tool, and homegrown system invents its own token format. OpenAI keys start with 'sk-', but your team's internal deployment token might be a random 40-character hex string with no distinctive prefix. Entropy analysis catches that 40-char hex string because its randomness score is high — no pattern needed. Regex is precise but brittle; entropy is broad but noisier. Shield uses both in sequence: high-entropy strings get scored, then matched against known patterns and contextual heuristics to filter false positives.

Question 3

What are the most common false positives with entropy-based detection?

Accepted Answer

UUIDs, Base64-encoded data, hashes (SHA-256, MD5), and compressed binary are the biggest offenders. A UUID looks cryptographically random but isn't a secret. Shield's filter packs solve this by layering context checks: is the string in a known UUID format? Is it inside a code comment or a string literal? Is it associated with an environment variable assignment? Shield also allows you to add custom exclusion patterns via Shield's domain filter pack — you can whitelist your internal test token prefix, for example, so it never triggers an alert.

Question 4

How does Shield actually detect secrets in LLM prompts and responses?

Accepted Answer

Shield sits as a silent proxy between your application and any LLM provider. Before a prompt leaves your infrastructure, Shield scans it with multiple detection engines in parallel: entropy analysis flags high-randomness substrings, regex engines match against 200+ known secret patterns, and context-aware rules check whether a detected string is being sent to an external API. If a match is found, Shield can redact the secret (replace with '[REDACTED]'), block the request entirely with an audit log entry, or return a warning to the developer. The hash-chain audit trail proves exactly what was detected and when — tamper-evident and SOC 2 ready.

Question 5

What's the difference between secret detection and prompt injection detection?

Accepted Answer

They're different threats requiring different defenses. Secret detection finds accidental data leaks — a developer pastes an API key into a prompt, or a customer's PII slips into a training dataset. Prompt injection is adversarial: an attacker crafts input designed to override system instructions or exfiltrate data. Shield handles both: the secret detection engine scans for key patterns and entropy anomalies, while the injection detection engine uses semantic analysis, instruction boundary detection, and delimiter sanitization. Many teams only think about injection — but accidental secret leaks are far more common and just as damaging.

Question 6

Does Shield slow down LLM API calls with all this scanning?

Accepted Answer

Sub-millisecond latency for most requests. Shield's scanning engines are optimized for streaming throughput — entropy analysis runs in O(n) time on the raw bytes, regex matching uses precompiled patterns stored in a DFA cache, and context rules execute as lightweight predicate checks. In benchmark tests against the major LLM providers, Shield adds under 2ms of overhead for a typical 2,000-token prompt. For high-throughput deployments, Shield's opaque mode processes data entirely in-memory without disk I/O, keeping latency deterministic. You get security without adding a bottleneck to your AI pipeline.

Secret Detection: How Entropy Analysis Catches What Regex Misses

Why Regex Alone Fails at Secret Detection

Custom tokens have no pattern to match

False positives waste security team hours

New vendor APIs drop weekly — regex can't keep up

Entropy Calculator & Regex Tester

Regex vs. Entropy: Head-to-Head

How Shield Detects Secrets: Four Detection Layers

Entropy Scanner

Regex Pattern Engine

Context-Aware Rules

Hash-Chain Audit Trail

Real-World Secret Leaks That Regex Missed

Stop Secrets Before They Reach Your LLM Provider

Frequently Asked Questions