Why Regex Alone Fails at Secret Detection
Custom tokens have no pattern to match
Every internal tool, CI pipeline, and homegrown service invents its own token format. A 40-character hex string with a team prefix (like T1M-) is a valid secret — but it won't match any public regex database. Regex catches the well-known; entropy catches everything else.
False positives waste security team hours
Overly broad regex patterns (like matching any 20+ uppercase alphanumeric sequence) match UUIDs, git SHAs, and Base64 blobs — burying real secrets in a flood of false alarms. Without entropy scoring to prioritize alerts, security teams burn time triaging noise instead of stopping leaks.
New vendor APIs drop weekly — regex can't keep up
Every time your team adopts a new AI provider, database, or SaaS tool, there's a new secret format to detect. Maintaining regex patterns for 200+ providers is a full-time job. Entropy analysis doesn't care about format — high randomness triggers review regardless of vendor.
Entropy Calculator & Regex Tester
Regex vs. Entropy: Head-to-Head
The pattern is clear: regex and entropy are complementary, not competing. Shield runs both engines in parallel for defense in depth.
How Shield Detects Secrets: Four Detection Layers
Entropy Scanner
Shannon entropy analysis runs first on every prompt and response. High-entropy substrings are flagged regardless of format — catching custom tokens, encoded credentials, and novel secret types. Thresholds are configurable per filter pack: set sensitivity higher for healthcare (PHI detection) or lower for dev environments with lots of Base64.
Regex Pattern Engine
200+ precompiled patterns covering all major providers (AWS, GCP, Azure, OpenAI, Anthropic, GitHub, GitLab, Stripe, Twilio, and more). Patterns are matched in parallel against flagged entropy hits for confirmation, not as the sole detection mechanism. Community pattern updates ship weekly via Shield's domain filter packs.
Context-Aware Rules
Not every high-entropy string is a leak. Shield's rule engine checks context: is the string inside a code comment? Is it part of a variable assignment? Is it being sent to an external domain? A test key (sk-test-...) in a README gets a different verdict than the same key being passed to api.openai.com.
Hash-Chain Audit Trail
Every detection — whether it triggers redaction, blocking, or a warning — is logged to Shield's tamper-evident hash chain. Each log entry includes the detection timestamp, entropy score, matched patterns, context, and action taken. SOC 2 auditors can verify the chain independently — no log can be altered without breaking the hash.
Real-World Secret Leaks That Regex Missed
Each of these scenarios passed regex-only detection but would be caught by entropy analysis. Every example is drawn from real incidents — internal tools, custom scripts, and homegrown automation that didn't follow any public token format.
Internal deployment token (40-char hex) pasted into a ChatGPT prompt by a junior dev.
Database connection string with embedded credentials sent to a coding assistant for debugging help.
Custom CI/CD webhook secret (no standard prefix) leaked via an AI code review agent.
Cloud provider API key for a lesser-known service (not in public regex databases) included in training data.
Stop Secrets Before They Reach Your LLM Provider
Shield ships with 200+ regex patterns, configurable entropy thresholds, and context-aware rules — all deployed as a silent proxy in one environment variable. Foundation tier starts at $10K/year.