Healthcare AI · Production
Medical coding at claims scale
Problem
Convert clinical documentation into the ICD and CPT codes that drive provider billing — without a human coder reading every case. Payers and hospital systems don't have the coder headcount to keep up with volume, and the long tail of specialty codes is where both accuracy and revenue live.
Build
PySpark pipelines on Databricks to move clinical cases from source systems through de-identification, feature prep, and labeling at scale. SageMaker services for prediction — model inference wrapped in the latency and reliability SLAs that a claims pipeline actually demands.
Result
10M+ cases moved through the pipeline in production. Real patients, real claims, real dollars on the other side of every inference.
What it taught me
In regulated industries, the model isn't the moat. Pipeline reliability is. The teams that won this problem weren't the ones with the best architecture — they were the ones whose data layer didn't rot.