Raw OCR Evidence Bundles¶
Sources/*/raw/ contains the repo-local evidence backend for the cutover source corpus.
What Lives Here¶
- One folder per
doc_idextracted from the Mistral OCR corpus. manifest.jsonuses the portableraw_manifest.v2schema and is the preferred metadata entrypoint for QA and citations.response.jsonis the full OCR response for deep inspection.pages/contains decoded page images when the OCR response included image payloads.Sources/citations/raw_index.jsonis the generated index used by family QA reports and future citation tooling.- Manifest path fields are
Sources-root-relative and machine-local OCR runner paths do not belong in the live corpus.
Current Coverage¶
- Bundles indexed:
89 - Raw roots:
BMP_2023/raw,BMP_2026/raw,NJAC_2023/raw,NJAC_2026/raw - 2023 BMP bundles:
34 - 2026 BMP bundles:
36 - 2023 NJAC bundles:
1 - 2026 NJAC bundles:
18
Utility¶
- Supports family-level
0_*QAPASS.mdposterity reports. - Supports coverage checks between extracted markdown, source registry, and raw OCR evidence.
- Provides the evidence base that future
bmp_crosswalkandreport_claimsrefreshes should use.
Notes¶
- This directory is evidence infrastructure, not authored content.
- Stale prose elsewhere in
Sources/citations/must not override live raw evidence and manifests stored underSources/*/raw/.