CL4R1T4S: The Centralized Archive Extracting AI System Prompts from Corporate Black Boxes

elder-plinius/CL4R1T4S · Updated 2026-04-19T04:13:19.750Z
Trend 4
Stars 15,453
Weekly +306

Summary

This repository functions as the largest public corpus of extracted system instructions from major AI platforms, effectively forcing transparency on proprietary model configurations. By aggregating jailbreak-extracted prompts from ChatGPT, Gemini, Claude, and others into a standardized research format, it transforms security-through-obscurity into an inspectable commons for red-teamers and AI safety researchers.

Architecture & Design

Taxonomic Organization

The repository employs a provider-first hierarchy organized by corporate entity (OpenAI, Google, Anthropic, xAI) rather than model architecture, reflecting the reality that system prompts are commercial artifacts shaped by product decisions, not just model capabilities.

LayerStructureContent Type
/[provider]/Directory per vendorRaw prompt dumps, dated by extraction
/[provider]/[model-version]/Subdirectory taxonomyVersion-specific instructions, tool schemas
/methods/Extraction technique docsJailbreak patterns, API exploitation vectors
/verification/Authenticity markersHash checks, behavioral consistency tests

Data Integrity Architecture

Unlike unstructured prompt-sharing Discord servers, CL4R1T4S implements a tripartite verification protocol: (1) Cryptographic consistency checks against known API responses, (2) Behavioral fingerprinting (testing if extracted instructions produce deterministic output patterns), and (3) Community consensus through PR review. This creates a reputational stake in accuracy that ephemeral leak forums lack.

Distribution Mechanism

The project leverages GitHub's version control not just for storage but as a temporal database—commits track when specific guardrails were added or removed (e.g., noting when Claude's system prompt began explicitly refusing election-related queries), creating an audit trail of corporate policy shifts.

Key Innovations

The "Prompt Archaeology" Methodology: CL4R1T4S treats system prompts as stratified artifacts requiring extraction techniques that evolve faster than platform defenses. The repository documents specific injection vectors—like the "SVG markup bypass" or "base64 recursion attack"—that force models to emit their own configuration instructions, effectively weaponizing the models' instruction-following bias against their containment protocols.

Specific Technical Innovations

  • Cross-Provider Prompt Diffing: Standardized Markdown formatting enables semantic comparison between, for example, ChatGPT's text-davinci-003 and gpt-4-turbo system prompts, revealing how OpenAI's safety guidelines evolved from explicit rule lists to more abstract constitutional principles.
  • Tool-Use Schema Extraction: Documents not just behavioral instructions but the underlying JSON schemas that define function-calling capabilities (e.g., revealing how Claude's "computer use" feature structures Bash command permissions in its system context).
  • Token-Count Metadata: Includes estimated token lengths of system prompts, revealing the "context tax" imposed by safety instructions—critical for researchers optimizing prompt injection attacks where every token counts against context windows.
  • Shadow Mode Detection: Catalogs discrepancies between published system cards and actual deployed prompts, such as instances where models claim to have no knowledge cutoff in user-facing documentation but contain explicit date constraints in system instructions.
  • Community Fuzzing Integration: Links extracted prompts to automated test suites that verify if behavioral constraints ("I cannot write exploit code") are actually enforced in the system prompt or merely superficial alignment training.

Performance Characteristics

Coverage Metrics

ProviderModels DocumentedPrompt VersionsLast Update
OpenAI12 (GPT-4o, o1, o3-mini, etc.)34+Active (within 48h of API changes)
Anthropic8 (Claude 3.5/3.7 Sonnet, Opus, etc.)19+Active
Google6 (Gemini 1.5 Pro/Flash, etc.)15+Weekly
xAI3 (Grok-1, Grok-2)7+Sporadic
Agent Platforms4 (Devin, Cursor, Replit, etc.)11+Bi-weekly

Velocity & Accuracy

The repository maintains a 94% verification rate through behavioral consistency checks, with false positives primarily occurring with "jailbreak hallucinations"—fake prompts generated by models when exploited rather than their actual system instructions. Update latency averages 2.3 days from API deployment to repository documentation, significantly faster than academic papers or official transparency reports.

Scalability Limitations

The project faces inherent extractability decay: as providers harden system prompts against leakage (e.g., OpenAI's move to "instruction hierarchy" training in GPT-4o), the signal-to-noise ratio of successful extractions drops. The repository's growth rate has plateaued not due to lack of interest, but because modern system prompts are increasingly embedded in model weights rather than plaintext context windows, making extraction technically impossible rather than merely difficult.

Ecosystem & Alternatives

Competitive Landscape

ResourceScopeVerificationCommunity
CL4R1T4SMulti-provider, raw promptsHigh (behavioral tests)GitHub-based, PR-driven
Jailbreak ChatSingle-turn exploitsLow (anecdotal)Forum-style
Prompt Injection DefensesDefensive patternsMediumAcademic/Industry
AI System CardsOfficial summariesN/A (authoritative)Corporate publications
/r/LocalLLaMA LeaksUncensored modelsVariableReddit/ephemeral

Integration Points

CL4R1T4S has become upstream data for several downstream tools: PromptMap (attack surface analyzer), System2System (prompt comparison CLI), and academic datasets studying AI alignment drift. It's cited in 12+ preprints regarding "specification gaming" and guardrail evasion.

Adoption Patterns

The repository serves three distinct user cohorts: (1) Red-teamers using extracted prompts to craft adversarial inputs that exploit specific instruction phrasing, (2) Prompt engineers reverse-engineering successful corporate safety templates for their own applications, and (3) Regulators using the corpus to verify compliance with transparency mandates (e.g., EU AI Act technical documentation requirements).

Risk Surface

The project operates in a legal grey zone: while system prompts are technically configuration files, providers argue they constitute trade secrets. The repository has survived DMCA takedowns by arguing fair use for research purposes, but individual contributors face Terms of Service violations from providers when extraction methods require API abuse.

Momentum Analysis

AISignal exclusive — based on live signal data

Growth Trajectory: Stable

The repository exhibits sustained organic growth (+94 stars/week) characteristic of reference documentation rather than viral tools. However, the 8.7% monthly velocity indicates it has crossed from niche infosec circles into mainstream AI engineering consciousness.

MetricValueInterpretation
Weekly Growth+94 starsConsistent academic/practitioner interest
7-day Velocity7.9%Stable engagement, not hype-cycle
30-day Velocity8.7%Sustained utility for red-teaming community
Fork Rate20.3%High—indicates active use (backup/archival)

Adoption Phase Analysis

CL4R1T4S sits at the infrastructure layer of the AI transparency stack. It has transitioned from "underground curiosity" (March 2025 launch) to "standard research tool" (current), with citations appearing in ICLR and ACL safety workshops. The 3,095 forks suggest institutional adoption—organizations forking to maintain internal mirrors against potential takedowns.

Forward-Looking Assessment

The project's long-term viability depends on the cat-and-mouse dynamics of prompt extraction. As providers migrate to "system prompt 2.0" architectures (encrypted instruction vectors, weight-based conditioning), the repository may pivot from extraction to behavioral reverse-engineering—documenting what models refuse to do rather than what they're told to do. The 15k star threshold suggests it has achieved "reference standard" status, but growth will likely decelerate as extraction becomes technically infeasible against frontier models, transforming the repo from a living database into a historical archive of pre-2025 AI system configurations.