
RadTriage: Adversarial AI for Medical Imaging Referral Triage
We built a system that reads doctor handwriting, interprets Medicare billing rules, and uses competing AI agents to achieve 99%+ accuracy in referral eligibility assessment.
99.2%
Triage Accuracy
~25s
End-to-End Latency
$380K
Revenue Recovered / yr
-73%
Billing Rejections
40 hrs
Staff Hours Saved / wk
Abstract
Medical imaging referral processing in Australia requires accurate interpretation of handwritten referral documents against a complex and frequently-updated Medicare Benefits Schedule (MBS). Manual triage is slow, error-prone, and expensive. Billing rejections from misinterpreted eligibility criteria cost radiology clinics hundreds of thousands of dollars annually. We present RadTriage, an end-to-end AI pipeline that digitises handwritten referrals using custom-trained OCR models, extracts structured clinical data, and determines Medicare eligibility through a novel adversarial reasoning architecture. Two independent AI agents, the Advocate and the Sceptic, evaluate each referral and argue their assessment until reaching consensus, mimicking the deliberative process of experienced billing staff. In pilot testing across 4,200 referrals at three radiology clinics, RadTriage achieved 99.2% agreement with expert human reviewers, with a mean processing time of 24.6 seconds per referral.
Introduction
Australia's Medicare Benefits Schedule defines the rules governing rebate eligibility for medical imaging services. For a radiology clinic, every patient referral must be assessed against the applicable MBS item codes, clinical indication criteria, and requesting practitioner qualifications before the scan is performed. An ineligible scan that proceeds results in a rejected Medicare claim, and the clinic absorbs the full cost.
Three factors compound the problem. First, the majority of imaging referrals in Australia are still handwritten, and the legibility of physician handwriting is, to put it charitably, variable. Second, the MBS is a labyrinthine document with thousands of item codes, complex eligibility rules, and frequent updates. Third, the clinical information on a referral must be mapped to specific codes and modalities, a task requiring both medical knowledge and billing expertise.
Most clinics rely on experienced reception and billing staff to perform this triage manually, and the approach holds up until it doesn't. Staff turnover means institutional knowledge walks out the door. MBS updates introduce new rules that take weeks to propagate through training. And the sheer volume, with a busy clinic processing hundreds of referrals daily, makes consistent accuracy nearly impossible.
RadTriage was designed to address this problem comprehensively: a single system that handles the full pipeline from paper referral to eligibility determination, with accuracy that matches or exceeds the best human reviewers.
System Architecture
RadTriage comprises four sequential processing stages, each implemented as an independent service with well-defined input/output contracts. This modularity enables independent scaling, testing, and model updates without system-wide redeployment.
2.1 Document Acquisition & Pre-processing
Referral documents enter the system via high-resolution scanning (300 DPI minimum, colour). A pre-processing pipeline performs deskewing, contrast normalisation, noise reduction, and binarisation. Document layout analysis identifies and segments the referral into regions of interest: patient demographics, clinical information, requesting practitioner details, and provider stamps. This segmentation is performed by a U-Net-based layout model trained on 12,000 annotated referral documents from six clinic partners.
2.2 Handwriting Recognition (Custom OCR)
Medical handwriting recognition is a harder problem than general handwriting OCR. Physician handwriting exhibits extreme variability in letterform, inconsistent spacing, liberal use of non-standard abbreviations (Hx, Dx, Rx, NAD, SOB, #NOF), and a tendency toward connected cursive that defeats commercial OCR engines.
We trained a custom recognition model based on a Transformer-encoder architecture with a CTC (Connectionist Temporal Classification) decoder, operating on line-level text segments extracted by the layout model. The training corpus comprises 85,000 annotated text-line images harvested from de-identified referral documents, supplemented with synthetic data generated by a handwriting style-transfer GAN that learned the statistical properties of physician handwriting.
Critical to performance is the medical vocabulary model, a domain-constrained language model that biases decoding toward medically plausible character sequences. This reduces character-level error rates by approximately 40% compared to vocabulary-agnostic decoding, particularly on abbreviated clinical terms.
2.3 Clinical Entity Extraction
The OCR output is processed by a clinical NER (Named Entity Recognition) model that extracts structured data: patient identifiers, date of birth, referring practitioner and provider number, clinical indication, body region, suspected diagnosis, relevant history, and any specific imaging requests. The NER model is a fine-tuned BioBERT variant trained on 6,000 annotated referral extractions, achieving an F1 score of 0.94 on held-out test data.
Extracted entities are cross-validated against external data sources where available: practitioner provider numbers are validated against the AHPRA register, and Medicare provider eligibility is confirmed via the HPOS (Health Professional Online Services) API.
Adversarial Reasoning Engine
The core innovation in RadTriage is the adversarial eligibility assessment architecture. A single model for Medicare eligibility carries high consequences for errors in either direction, so we implemented a dual-agent debate system inspired by adversarial collaboration frameworks in AI safety research.
3.1 The Advocate
The Advocate agent receives the structured referral data and attempts to construct the strongest possible case for Medicare eligibility. It identifies applicable MBS item codes, matches clinical indications to eligibility criteria, and generates a reasoned argument for why the referral qualifies for a Medicare rebate. The Advocate operates on the principle of charitable interpretation: where ambiguity exists, it resolves in favour of eligibility.
3.2 The Sceptic
The Sceptic agent receives the same structured data and independently constructs the case against eligibility. It identifies potential disqualifying factors: missing information, time-based restrictions (e.g., repeat imaging intervals), clinical indication mismatches, provider qualification gaps, and edge cases in MBS interpretation. The Sceptic applies the strictest reasonable reading of the rules.
3.3 Deliberation Protocol
The two agents engage in a structured multi-turn debate. Each round, the Advocate presents or refines its argument for eligibility, and the Sceptic challenges specific claims with counter-evidence or alternative rule interpretations. Both agents have access to the complete, current MBS schedule as a retrieval-augmented knowledge base, ensuring arguments are grounded in the actual regulatory text rather than training data that may be stale.
The debate proceeds for a minimum of two rounds and a maximum of five. Convergence is declared when both agents agree on the eligibility determination and the applicable item code(s). If the agents fail to converge after five rounds, the referral is flagged for human review with the full debate transcript attached, giving the reviewer a structured analysis of the ambiguity.
Both agents are implemented as fine-tuned LLMs with role-specific system prompts and chain-of-thought reasoning. Temperature is set to 0.1 for the Advocate (favouring consistent, optimistic interpretation) and 0.05 for the Sceptic (favouring deterministic, conservative analysis). Each agent's output is structured as JSON with explicit reasoning chains, enabling full auditability of every decision.
MBS Knowledge System
The Medicare Benefits Schedule is not a static document. Item codes are added, modified, and deprecated. Eligibility criteria change. Fee schedules are updated quarterly. Any system that hardcodes MBS rules will be wrong within months.
RadTriage maintains a structured, version-controlled representation of the MBS as a knowledge graph. Each item code is a node with edges to its eligibility criteria, applicable modalities, body regions, clinical indications, fee schedule, and restriction rules (time-based, frequency-based, referrer-qualification-based). When the MBS is updated, the knowledge graph is diffed against the previous version, and affected decision paths are automatically flagged for regression testing.
Both the Advocate and Sceptic agents access the MBS knowledge graph via retrieval-augmented generation (RAG). Queries are embedded using a domain-specific embedding model and matched against the knowledge graph with hybrid sparse/dense retrieval. This ensures that arguments are always grounded in the current regulatory text, not in potentially outdated parametric knowledge.
Document Acquisition
- 300 DPI colour scan
- Deskew & normalise
- Noise reduction
- Layout segmentation
U-Net layout model · 12K training docs
Handwriting Recognition
- Line-level segmentation
- Transformer-CTC decoder
- Medical vocabulary bias
- Style-transfer augmentation
85K annotated text-lines · GAN synthetic data
Entity Extraction
- Patient demographics
- Practitioner & provider
- Clinical indication
- Body region & diagnosis
BioBERT fine-tune · F1: 0.94 · AHPRA validation
Adversarial Reasoning
- Advocate agent (FOR)
- Sceptic agent (AGAINST)
- 2–5 round debate
- Consensus or flag
Fine-tuned LLMs · RAG over MBS knowledge graph
Pt: Margaret Chen
DOB: 14/03/1958
Acute pleuritic chest pain
? PE r/o pneumothorax
Dr S. Kapoor
Pt: Margaret Chen
DOB: 14/03/1958
Acute pleuritic chest pain
Margaret Chen
14/03/1958 (67y)
Acute pleuritic chest pain
?PE r/o pneumothorax
Dr S. Kapoor
Prov: 2466731T
{
"patient": "Margaret Chen"
"dob": "1958-03-14"
"indication": "acute pleuritic chest pain"
"query": "PE, pneumothorax"
"referrer": "Dr S. Kapoor"
"provider": "2466731T"
}
Referral qualifies under MBS item 57001 — CT Brain without contrast. Clinical indication of 'persistent headache 6 weeks, ?SOL' meets the diagnostic imaging criteria for neurological investigation.
The referring practitioner is a physiotherapist (provider number format confirmed). Under MBS rules, CT Brain requires a referral from a medical practitioner or specialist. Physiotherapists can only request limited imaging — CT Brain is not on the approved list.
Acknowledged — revising assessment. The physiotherapist referral pathway is restricted. However, the referral includes a co-signature from Dr. S. Kapoor (AHPRA validated, GP). If the GP is the effective referrer, item 57001 eligibility is restored.
The co-signature is present but the GP's provider number is not listed on the referral form. Without a valid provider number from the referring medical practitioner, Medicare cannot process the claim regardless of clinical merit.
Conceding eligibility under current referral document. Recommend FLAG for human review — the clinical indication is valid, the GP relationship is evidenced, but the referral document is technically incomplete. Staff should contact the GP practice to obtain the provider number before proceeding.
Agreed. FLAG is the correct determination. The referral is not rejectable on clinical grounds but is not processable in its current form. Specific action required: obtain GP provider number.
Action: Contact GP practice for provider number · 3 rounds · 22.4s
57001
CT Brain w/o contrast
57004
CT Chest
57007
CT Abdomen
Neurological
Respiratory
GP Referral
Specialist
12-month rule
CT Modality
Results
5.1 Pilot Study Design
RadTriage was evaluated in a prospective pilot across three radiology clinics in metropolitan Sydney over a 12-week period. All incoming referrals (n=4,217) were processed by both the RadTriage system and the clinic's existing manual triage process. Staff were blinded to the system's output during the evaluation period. An expert panel of two senior billing specialists independently reviewed all cases where the system and manual triage disagreed.
5.2 OCR Performance
The custom OCR model achieved a character-level accuracy of 96.8% across all referral text, rising to 98.3% on printed text and 94.1% on handwritten text. With the medical vocabulary model applied, word-level accuracy on clinical terms reached 97.2%. For comparison, Google Cloud Vision achieved 91.4% character-level accuracy and Amazon Textract achieved 89.7% on the same handwritten test set. The domain-specific training and medical vocabulary model provide a clear advantage on physician handwriting.
5.3 Entity Extraction
The clinical NER model achieved an overall F1 score of 0.94. Performance varied by entity type: patient demographics (F1: 0.98), referring practitioner (F1: 0.97), clinical indication (F1: 0.91), body region (F1: 0.96), and suspected diagnosis (F1: 0.89). The lower performance on clinical indication and diagnosis reflects the inherent ambiguity and abbreviation density in these fields.
5.4 Eligibility Determination
Of 4,217 referrals processed, RadTriage achieved exact agreement with the expert panel on eligibility determination in 4,183 cases (99.2%). Of the 34 disagreements, 22 were cases where the system flagged for human review due to non-convergence in the adversarial debate, the intended behaviour for ambiguous referrals. Of the remaining 12 true errors, 8 were attributable to OCR misreading (typically in severely degraded handwriting) and 4 to entity extraction errors that propagated into incorrect code assignment.
The adversarial architecture caught 17 cases that the manual triage process got wrong: referrals that staff approved but that were ineligible, representing approximately $12,400 in avoided billing rejections during the pilot period alone.
5.5 Processing Performance
Mean end-to-end processing time was 24.6 seconds per referral (σ = 4.8s). The breakdown: document pre-processing 1.2s, OCR 14.8s, entity extraction 1.4s, adversarial deliberation 7.2s (mean 2.4 debate rounds). The OCR phase dominates latency because custom handwriting recognition on degraded input requires multiple inference passes with beam search decoding and medical vocabulary re-ranking. Referrals with particularly poor handwriting trigger additional recognition passes, pushing worst-case OCR latency above 20s.
5.6 Financial Impact
Extrapolating from the pilot data: across the three clinics, RadTriage is projected to recover approximately $380,000 per year in previously-rejected Medicare claims, while simultaneously reducing staff triage time by approximately 40 hours per week. The system identified an additional $49,600 in referrals that would have been incorrectly approved, preventing downstream audit exposure.
Eligibility Accuracy
Handwriting OCR (char-level)
Clinical NER (F1 score)
1.2s
Pre-process
14.8s
OCR
1.4s
NER
7.2s
Debate
Note: Custom OCR dominates latency at 60% of total processing time. Handwriting recognition requires multiple inference passes with beam search and medical vocabulary re-ranking. Worst-case OCR on severely degraded handwriting: 20s+.
Discussion
The adversarial architecture is the key differentiator. Single-model approaches to eligibility determination, even well-trained ones, tend to develop systematic biases. A model trained primarily on eligible referrals develops a tendency to approve; one trained with strong negative examples becomes over-conservative. By forcing two agents to argue opposing positions, RadTriage surfaces the reasoning behind each determination and catches cases where a single model would silently err.
The non-convergence flag is a feature. Referrals where the agents cannot agree after five rounds are ambiguous. These are the cases that benefit most from experienced human review. The debate transcript provides the reviewer with a structured analysis they wouldn't otherwise have, and it often identifies specific MBS clauses or clinical interpretation questions that need resolution.
During the pilot, RadTriage identified a category of referrals that clinic staff were systematically misclassifying: a specific interaction between time-based restrictions and provider qualification rules that affects approximately 2.3% of referrals. This rule interaction was not well-understood by staff at any of the three pilot sites, suggesting a systemic training gap in the industry.
Conclusion & Future Work
RadTriage demonstrates that adversarial AI architectures can achieve expert-level performance on complex regulatory interpretation tasks while maintaining full auditability and appropriate deference to human judgment on ambiguous cases.
The system is currently in production deployment at three radiology clinics, with expansion planned to an additional twelve sites in 2026. Ongoing work includes:
- Extension to additional imaging modalities (PET, nuclear medicine) with modality-specific eligibility rules
- Integration with RIS (Radiology Information Systems) for end-to-end workflow automation from referral to billing
- Real-time MBS change monitoring with automated regression testing of affected decision paths
- Federated model updates across clinic sites to improve OCR performance on site-specific handwriting patterns without sharing patient data
- Investigation of the adversarial architecture's applicability to other regulated domains: PBS (Pharmaceutical Benefits Scheme) eligibility, workers' compensation claim assessment, and insurance pre-authorisation
The broader implication is that adversarial reasoning, having AI systems argue both sides of a determination before reaching a conclusion, may be a general-purpose pattern for high-stakes classification tasks where false positives and false negatives carry asymmetric costs. We believe this architecture has significant potential beyond medical billing.
Technology Stack
OCR & Vision
- Custom Transformer-CTC model
- U-Net layout segmentation
- Style-transfer GAN (synthetic data)
- OpenCV pre-processing
NLP & Reasoning
- Fine-tuned BioBERT (NER)
- Fine-tuned LLMs (Advocate/Sceptic)
- RAG with hybrid retrieval
- Medical vocabulary model
Knowledge & Data
- MBS knowledge graph
- AHPRA/HPOS API integration
- Version-controlled rule sets
- PostgreSQL + vector store
Infrastructure
- Python (FastAPI)
- Docker + Kubernetes
- GPU inference (NVIDIA T4)
- On-premise deployment option