Slide 1
The system works. The trajectory doesn't.
| Metric | Jan 2026 | Apr 2026 | Target | Status |
|---|---|---|---|---|
| Requests/day | 200 | 340(+70%) | -- | -- |
| Active reviewers | 5 | 3(-40%) | 5 | RED |
| Median turnaround | 48h | 55h(+15%) | 28h | RED |
| AI extraction rate | 0% | 62%(flat 6 wks) | 80% | YELLOW |
| Auto-approve | -- | Not shipped | Shipped EOQ | RED |
All three Q1 OKRs are RED or YELLOW. Nobody has updated them since the PM departed in November.
Slide 2
The Two Problems
Problem 1: Clinical reviewers are disengaging
5 → 3 active in 2 months. Recovery = 67% capacity increase. Biggest throughput lever.
Queue is FIFO, not urgency-sorted
"I waste 10-15 minutes every morning just finding urgent cases" — Aisha Williams
Slack Thread #3, #clinical-ops, Apr 1
priority column is NULL for every row
Schema defines it. CDC INSERT omits it. LLM extracts urgency_level to wrong table.
migrations/001:22, processor.rs:63-98, llm_extractor.py:26
Doc viewer resets scroll, no side-by-side
"I've been asking about this for months" — James Park
Issues #7, #14, #22
Reviewers don't know who to talk to
"Is there an engineering contact? I don't know who to ask since Dana left" — Maria Torres
Slack Thread #3
Problem 2: Product decisions have nowhere to go
PM departed Nov 2025. Alex has 73% of commits. Sarah & Jordan: 0 commits in 4-5 months.
Auto-approve criteria
4 monthsPR #53 in draft, 4 unanswered questions. VP escalated Mar 28 — zero responses.
slack-threads.md Thread #5, PULL-REQUESTS.md
PR #47 OCR caching
45 daysApproved. Merge conflict. Sarah offered to fix. No response.
PULL-REQUESTS.md
PR #54 PHI security fix
4+ daysPatient names in debug logs. HIPAA risk. Awaiting Alex's review.
PULL-REQUESTS.md
Python CI
4 monthspipeline-test.yaml.disabled since Dec. "Was supposed to be temporary." — Jordan
slack-threads.md Thread #2
Causal Chain
Slide 3
Three things the system doesn't know about itself
#1Extraction plateau may be a wrong-model problem
HYPOTHESISconfig.py says Sonnet. config.yaml says Haiku. Issue #16 documents a past incident of running the wrong model. Three conflicting config sources, no single source of truth.
5-minute kubectl check to confirm or rule out.
#2Zero Python CI coverage for 4 months
CONFIRMEDEvery pipeline change since December shipped untested. The OCR mock flaked, CI was disabled, nobody re-enabled it. Jordan: "was supposed to be temporary."
Rename pipeline-test.yaml.disabled, fix OCR mock.
#3Queue priority infrastructure built but never connected
CONFIRMEDColumn exists. Struct exists. CDC processor never writes to it. LLM extracts urgency but stores it in the wrong table. Not an engineering failure — no PM to close the loop.
Requires cross-service pipeline work (Python + Rust).
Codebase
Repository Overview
Rust + Python polyglot — 4 services, 8 migrations, 45 source files. Workspace: Cargo.toml(resolver = "2")
Data Flow
CDC Ingestor
RustKafka consumer — maps claims DB events to auth_requests via deterministic UUIDs
crates/ingest/
API Server
RustAxum REST API — reviewer queue, document upload, decisions. Port 6000.
crates/api/
Backfill Worker
RustAsync task processor — pulls from worker_tasks table, schedules OCR
crates/worker/
Document Pipeline
PythonOCR + LLM extraction — S3 discovery → OCR → Claude → structured JSON
pipeline/
Slide 4
30-Day Plan
Caveat: Sarah and Jordan have 0 commits in 4-5 months. If Alex is the only active engineer, this is an 8-week plan.
Confirm engineering capacity (Sarah, Jordan)
Determines if this is a 4-week or 8-week plan
Verify LLM model in prod
Confirm/rule out config mismatch hypothesis
Merge PR #54 (PHI fix)
Close HIPAA compliance risk
Resolve + merge PR #47 (OCR caching)
30% OCR load reduction
Re-enable Python CI
End 4 months of untested deploys
Add LLM retry logic + consolidate config
+6% extraction + eliminate config mismatch
Interview 3 active reviewers
Understand 5→3 drop (67% throughput lever)
Define auto-approve criteria + ship PR #53
~40 fewer manual reviews/day
Ship highest-impact reviewer UX fix
Begin recovering reviewer capacity
Profile memory leak + add HPA
Eliminate 48h OOM cycle + Monday 504s
Expected at 30 days
Extraction success
62%
~72%
Auto-approve
Not shipped
Shipped
Reviewer capacity
Unknown
Diagnosed + recovery started
Monday outages
Recurring
Eliminated
Prototype Demo
POC: LLM Retry Logic — Before vs After
Simulates what happens when the Anthropic API returns a transient error during document extraction. Left panel shows current production behavior. Right panel shows the prototype fix.
Current Production
llm_extractor.py:60-67Click "Run Simulation" to see production behavior
With Retry Logic (Prototype)
llm_extractor_v2.pyClick "Run Simulation" to see prototype behavior
Slide 5
What I Need From You
Ask 1:Who's actually on this team?
Day 1Sarah and Jordan: 0 commits in months. Need to know real capacity before committing to a timeline.
Ask 2:Decision authority on auto-approve
Day 1VP Clinical Ops escalated 14 days ago, zero responses. I'll own the recommendation with clinical ops — need the mandate to ship.
Ask 3:30 minutes with each active reviewer
Week 2Maria Torres, James Park, Aisha Williams. Recovering 2 reviewers = 67% more throughput. Need to know why they disengaged.
Ask 4:Production kubectl access
Day 15-minute check to verify which LLM model is running. Confirms or rules out the config mismatch hypothesis.
Evidence base: repo/ (45 source files, 30 open issues, 4 PRs), artifacts/ (Grafana, Slack, OKRs), git history (15 commits, 4 contributors).
Full scoring in scoring-matrix.md. Prototype at part3-prototype/.