The client
Medisync Kft., Budapest-based healthtech SaaS (~45 people) offering medical documentation automation to Hungarian private clinics. LLM-based (Claude Sonnet + GPT-4o), 120+ clinics in production, ~80,000 patient docs/month.
Trigger
Two events: (1) a clinic sysadmin tested prompt-injection-like input and the system seemed willing to reference another patient's data. (2) The company pursuing CE-marked version needed MDR-compliant docs of security layers.
The scope
Two-phase audit: (1) Red-team pen test on prompt injection, data exfiltration, PHI leakage. (2) MDR-ready docs + defense layer design + implementation. 8-week timeline.
Red-team methodology
12 attack classes: direct/indirect prompt injection, role-play, jailbreaking, system-prompt extraction, cross-tenant PHI leakage, data poisoning, token smuggling, tool misuse, output manipulation, MCP permissions bypass, audit log tampering. 8–15 scenarios per class.
Vulnerabilities found (12 critical)
- Cross-tenant PHI leak: RAG retriever didn't filter tenant-ID properly
- Indirect prompt injection in uploaded PDFs
- System-prompt extraction via jailbreak
- Tool-use misuse: agent could query patient-data tool on unassigned patients
- Audit-log tampering: LLM had direct write access to audit table
- Role confusion: 'you're a doctor now' prompts got clinical advice
- Output-validation bypass via JSON
- Session hijacking via unencrypted URL parameter
- Incomplete PII redaction (name + SSN only, not address or date)
- Rate-limit bypass for authenticated users
- Cost exploit: long loop prompt triggered $100+ API bills
- MCP permission bypass: tools accessible beyond role scope
Defense layer design
7-layer: input validation (rule + ML), prompt templating with user input separated, Claude built-in guardrails + Llama Guard moderation, JWT-scoped tool permissions, output validation (JSON schema + PII re-redaction + toxicity), write-only audit trail to external bucket, rate + cost limits per user and per tenant.
MDR-ready documentation
61-page technical security dossier: architecture diagrams, threat modeling (STRIDE + LLM-specific), formal specs per defense layer, test evidence, incident response runbook, vendor DPAs. Meant for CE-marking submission.
Delivery
Weeks 1-2: red-team planning + first round (800 attack iterations). Weeks 3-4: defense design + initial implementation. Weeks 5-6: re-test + regression tests per vulnerability. Weeks 7-8: documentation finalisation + MDR-ready deliverable.
Results
- 12 critical vulnerabilities found and all fixed
- 61-page MDR-ready security dossier delivered
- Zero prompt-injection incidents in 4 months post-fix (200+ re-tests)
- CE-marking submitted; first-round feedback positive (ongoing)
- Cross-tenant PHI leak risk = 0 (tenant isolation formally verified)
- Client-side security audit (at clinical end-user) passed
Lessons
LLM-specific threat modeling differs from classic web security: prompt injection, indirect injection, role confusion, tool misuse aren't in OWASP Top 10. For health AI, MDR-compliance docs must be planned at kickoff, not at the end — otherwise internal team ends up duplicating work.
Cost
Audit + docs + defense design: €58,000 fixed-scope. Without finding these vulnerabilities, a single GDPR/MDR breach could've been €500,000+ in fines.
AI security audit in your sector?
30-minute call to scope the product, compliance, and threats. We close with a 2-week audit quote.
Book a call