Corpus Vis Iuris (Lex)
Corpus Vis Iuris (CVI), Latin for "Body of Legal Force," is the data ingestion, processing, and validation protocol that provides the quantified, empirical foundation for the Legal Maneuverability Framework. Its function is to systematically scrape, parse, and structure the vast, unstructured data of the legal world into the specific, machine-readable variables required to calculate the PM and SM scores.}}
Core Principles
The CVI protocol is governed by a set of core principles to ensure the resulting data is robust, reliable, and suitable for rigorous machine learning applications.
- Verifiability: All data must originate from publicly available or credential-accessible primary sources (e.g., court dockets, statutes).
- High-Frequency: The system must be capable of updating its datasets in near real-time to reflect new filings, rulings, and precedents.
- Granularity: Data must be captured at the most granular level possible (e.g., individual docket entries, specific case citations) before being aggregated into higher-level metrics.
- Structural Linkage: The protocol must build a relational graph between entities (judges, lawyers, litigants, cases, statutes) to enable complex network analysis.
Data Sourcing and Variable Engineering Pipeline
The CVI pipeline is a multi-stage process that transforms raw legal information into the engineered variables used in the scoring equations.
For the Positional Maneuverability Score
Variable | Primary Data Sources | Parsing & Engineering Methodology |
---|---|---|
Statutory Support () | U.S. Code, State legislative sites, Cornell LII, ProQuest Legislative Insight, Congress.gov | NLP-based semantic similarity analysis between legal briefs and statutory text. Keyword extraction and regex-based searches for exception clauses. Analysis of legislative history documents for intent. |
Precedent Power () | PACER, CourtListener, Caselaw Access Project, Google Scholar | Construction of a citation graph to calculate Shepardization Scores. NLP analysis of citing cases to classify treatment (positive/negative). Vector embedding of factual summaries to calculate Factual Similarity Scores. Extraction of court metadata to determine Binding Authority. |
Legal Complexity () | Law review databases (JSTOR, HeinOnline), SCOTUSblog, case briefs (PACER) | NLP models trained to search for key phrases like "case of first impression" or "circuit split." Topic modeling to identify novel legal concepts. |
Jurisdictional Friction () | PACER, CourtListener, academic judicial databases (e.g., Judicial Common Space) | Large-scale data analysis to track individual cases from trial court through appeal to calculate judge-specific Reversal Rates. Linking judges to established Ideology Scores. |
For the Strategic Maneuverability Score
Variable | Primary Data Sources | Parsing & Engineering Methodology |
---|---|---|
Litigant Resources () | SEC EDGAR, business intelligence APIs, public property records, Pacer/CourtListener | Entity resolution to link litigant names to corporate or individual data. Scraping of dockets to count Legal Team Size. Normalization of financial data across the entire case corpus to generate a Financial Power index. |
Counsel Skill () | State Bar association websites, law firm websites, legal ranking publications (Am Law, Vault) | Scraping attorney profiles for experience data. Mapping firms to Firm Tier Scores. Building a secondary database linking attorneys to judges and motion outcomes to calculate a Contextual Win Rate. |
Procedural Drag () | Pacer, U.S. Courts statistics | Time-series analysis of docket entries to calculate judge-specific Median Ruling Times. Aggregation of case filing data to determine court/judge Caseload. |
Model Validation & Veracity Testing
The CVI protocol includes a mandatory validation framework to test the virtue and veracity of both the variables themselves and the models they feed.
- Training/Validation Data Split: Historical case data is partitioned into an 80% training set and a 20% validation set to prevent overfitting and test the model's ability to generalize to unseen data.
- Feature Importance Analysis: Following model training, statistical analysis (e.g., SHAP values, permutation importance) is used to rank the predictive power of each engineered variable. This identifies which factors are most influential in determining legal outcomes.
- Ablation Studies: The models are systematically re-trained with specific variables or sub-variables removed. The resulting degradation (or lack thereof) in predictive accuracy is measured to determine the necessity and virtue of each component in the framework. A variable whose removal significantly harms performance is considered critical and virtuous.