Before Institutions Trust an AI Patient, They Should Ask What Sits Behind It
Why the next phase of AI simulation in healthcare education will be judged by reliability, fairness and educational impact, not just realism.
Innovation is not just about what we can build. It is about what we should build, who it serves, who it might exclude, and who remains accountable when it goes wrong.
Key takeaways
- The question is no longer whether an AI patient can hold a conversation. It is whether the system around it can be trusted, defended and evaluated.
- Modality should match the learning objective. Voice and video serve different educational purposes; neither is inherently better.
- Reliability, fairness, governance, curriculum alignment, assessment defensibility and educational impact are the six things institutions are really evaluating.
- Bias evaluation, evidence depth and post-launch model monitoring still need more work across the whole sector.
Innovation is not just about what we can build.
It is about what we should build. Who it serves. Who it might exclude. Who it might harm. And who remains accountable when it goes wrong.
That question matters deeply in healthcare education. A simulated patient is not simply a chatbot with a clinical backstory. It is a learning environment. It shapes how students ask questions, interpret risk, practise empathy, receive feedback and understand what “good” clinical performance looks like.
As generative AI becomes easier to build with, the barrier to creating conversational tools has fallen dramatically. What once required years of engineering resource can now often be assembled far more quickly. That is exciting, but it also creates a serious challenge for institutions.
The question is no longer whether an AI system can hold a conversation. The question is whether it can be trusted.
For a real solution, institutions need answers to a tight set of questions:
- Can it respond consistently across questioning styles?
- Can it assess fairly across learner cohorts?
- Can it support different learners equitably?
- Can it align with local curricula and marking standards?
- Can educators understand and oversee how feedback is generated?
- Can institutions evidence that the solution improves learning, not just activity?
These are the questions that should define the next phase of AI-enabled clinical simulation.
The confidence problem in AI simulation

Healthcare education has always had a simulation bottleneck.
Students need repeated opportunities to practise communication, clinical reasoning, escalation, documentation and professional judgement. Educators need scalable ways to deliver that practice without compromising educational quality. Traditional simulation with actors, faculty observers and dedicated spaces is valuable, but it is expensive, time-limited and difficult to scale.
AI simulation can help address that bottleneck. It can allow learners to practise repeatedly, receive feedback quickly and encounter a wider range of clinical scenarios. The wider evidence base for virtual patient simulation is promising. A systematic review and meta-analysis of virtual patient simulations in health professions education found that virtual patients produced similar knowledge outcomes to traditional education and favoured virtual patients for skills outcomes, while also highlighting variation in study design and implementation.[1]
That distinction matters. The direction of travel is promising, but the sector should avoid mistaking momentum for settled evidence.
Institutions do not need AI systems that simply sound impressive in a demo. They need systems they can defend to students, faculty, regulators and, ultimately, patients.
NICE's evidence standards framework for digital health technologies was created to help evaluators and decision-makers assess whether digital health technologies are likely to offer benefits to users and the health and care system. It also makes clear that meeting evidence standards is not the same as formal NICE endorsement or regulatory approval.[2]
For AI simulation in healthcare education, that should be the mindset: not “does this feel exciting?”, but “has this been designed, tested and governed in a way that an institution can responsibly adopt?”
Evaluation framework
What institutions are really evaluating
Six areas that consistently come up in conversations with medical schools, NHS education teams and healthcare programme directors.
Reliability
Consistent patient behaviour and feedback across questioning styles and learner cohorts.
Fairness
Equitable performance across accents, dialects, demographics and communication styles.
Governance
Clear data, safety, escalation and oversight controls educators can defend.
Curriculum alignment
Mapped to local learning outcomes, assessment standards and disciplinary context.
Assessment defensibility
Scores tied to a transparent rubric and traceable to evidence in the consultation.
Educational impact
Evidence that the system improves learning, not just increases activity.
The realism question needs more nuance
One of the easiest mistakes in simulation is to equate realism with educational value.
A simulation can look realistic and still be poorly aligned to the learning objective. Equally, a simple simulation can be highly effective if it targets the right skill, at the right level, for the right learner.
Hamstra and colleagues challenged the traditional emphasis on fidelity as physical resemblance, arguing that educational effectiveness is better understood through concepts such as transfer of learning, learner engagement and suspension of disbelief.[3]
The goal is not always to make a simulated patient look, sound or behave with maximum realism. The goal is to select the level of realism that supports the intended learning outcome.
This is where the conversation around voice and video often becomes too simplistic.
Voice is not “less serious” because it lacks visuals. Video is not “better” simply because it looks more realistic. Both modalities can be educationally valuable when used for the right objective.
Modality
Modality should match the learning objective
Voice simulation thrives when the goal is
- History-taking structure
- Consultation flow
- Question phrasing
- Verbal reasoning
- Repeated practice at scale
- Early confidence building
- Remote or asynchronous practice
Video simulation thrives when the goal involves
- Breaking bad news
- Distressed relatives
- Mental health risk assessment
- Delirium or confusion
- Capacity assessment
- De-escalation
- Telemedicine realism
- Recognising hesitation or emotional distress
The responsible question is not whether voice or video is better. It is what level of realism is educationally necessary for the skill being trained.
What institutions should expect from AI-enabled simulation
Before adopting an AI simulation solution, institutions should look beyond surface-level conversation quality.
A useful assurance framework should cover at least six areas:
- Educational alignment: mapped to your learning outcomes, not generic ones.
- Response reliability: consistent behaviour across questioning styles.
- Assessment defensibility: scores traceable to rubric and transcript.
- Governance and safety: clear data, escalation and oversight controls.
- Fairness and equity: equivalent experience across accents and demographics.
- Evidence generation: proof the solution improves learning over time.
The NHS Digital Technology Assessment Criteria brings together recognised good practice across areas such as clinical safety, data protection, technical security, interoperability, usability and accessibility. Although educational simulation platforms are not identical to clinical digital health technologies, institutions are increasingly expecting the same mindset: structured assurance, not informal promises.
Curriculum embedding is the real adoption challenge
Most AI simulation tools can hold a conversation. Very few are built to slot into the way a programme actually teaches and assesses. Curriculum embedding is where adoption usually breaks down, and where MedAscend is designed to be a complete solution rather than a bolt-on.
A simulation platform that does not map to your curriculum is a tool. A platform that does is a solution.
In practice, MedAscend helps programmes embed AI simulation by giving educators the controls they need without engineering support:
- Map every scenario to your learning outcomes, year of study and discipline.
- Embed your own marking rubric so feedback reflects your standards, not the vendor's.
- Align with existing OSCE blueprints and station structures, including timed circuits.
- Author scenarios in-house with AI autofill, so faculty own the content.
- Track cohort and station-level analytics against curriculum domains.
- Support multi-discipline programmes across medicine, nursing, pharmacy, PA, paramedicine and allied health from one workspace.
The result is a solution that fits inside the curriculum, rather than asking the curriculum to reshape itself around the technology.
Fairness cannot be an afterthought
Bias in AI simulation is not only a technical issue. It is an educational and ethical issue.
If a system responds less accurately to certain accents, first languages or speech patterns, some learners may receive a poorer experience. If feedback models reward one communication style while penalising another, assessment may become unfair. If patient personas are not designed carefully, simulation can reinforce stereotypes rather than challenge them.
The WHO's guidance on AI for health emphasises that ethics and human rights must be placed at the centre of AI design, deployment and use. It also highlights the need for governance that holds stakeholders accountable to healthcare workers and the communities affected by these technologies.[4]
Healthcare education should apply the same standard.
21 months of pre-launch work
Five core problems we focused on before launch
- 1
Data quality
Volume alone is not enough. The dataset became part of the infrastructure needed to move beyond an impressive conversation and towards a reliable learning system, supporting response consistency, disclosure behaviour, feedback specificity and assessment logic.
5,000,000+real consultation data points curated for structure, not just volume - 2
Educator alignment
Co-design sessions with educators and institutions reshaped the product. Schools needed educator-controlled scenarios, curriculum alignment, marking frameworks, analytics, governance and feedback that reflected their own standards, across medicine, nursing, pharmacy, physician associate and allied health programmes.
- 3
Assessment reliability
An inter-rater reliability study compared MedAscend's AI feedback engine with real OSCE examiners across tested domains. The aim is not to replace examiners. It is to make formative feedback close enough to human standards to support repeated practice at scale.
- 4
Product iteration through failure
Disclosure control, actionable feedback, scoring alignment with rubrics, and clearer separation between what a learner missed and what was never reasonably elicited. These were not minor issues. They were the work.
- 5
Evidence generation
Three research papers are currently in peer review. Until publication, those findings are described as work in peer review, not settled evidence. We have also run funded pilots, including pilots funded by us, to remove financial barriers to early institutional evaluation.
What we have not solved yet
A credible conversation about AI simulation should not pretend that every challenge has been solved.
Algorithmic bias in health AI is well documented: a widely cited study found that an algorithm used to manage the care of millions of patients systematically under-referred Black patients relative to equally sick White patients, producing unequal outcomes across populations.[5] Although AI simulation is not the same as a clinical decision algorithm, the principle transfers: if a system influences learning, feedback or progression, institutions should expect transparency about bias, generalisability and subgroup performance.
Honest assessment
What still needs more work across the sector
Bias and subgroup evaluation
More formal subgroup evaluation across learner demographics, accents, dialects, first-language differences and socioeconomic context is needed across the whole sector, not just one company.
Evidence depth and long-term outcomes
Early pilots can show engagement and acceptability. The sector still needs stronger evidence on objective performance, retention, transfer into clinical settings and long-term educational impact.
Model drift and post-launch monitoring
AI systems change, underlying models update and user behaviour shifts. A simulation system that performs well at launch still needs structured monitoring after launch.
These are not reasons to avoid AI simulation. They are reasons to adopt it carefully and evaluate it continuously.
Working with educators, not around them
Why these problems need to be solved with educators
The most important solutions in AI simulation will not be built by companies alone. They will be built with the educators responsible for teaching, assessing and protecting learners.
Learner proximity
Built from the problem students face: too few opportunities to practise, too little detailed feedback.
Educator co-design
Shaped with clinicians, educators and institutions, not delivered to them as a finished product.
Institution-defined rubrics
Marking frameworks and learning outcomes set by the institution, not by the vendor.
Funded pilots and real-world evaluation
Removing financial barriers so institutions can generate real usage and feedback data.
Continuous feedback loops
Iteration driven by educator input, learner outcomes and post-deployment observation.
Governance and oversight
Clear controls around data, scenarios, escalation, safeguarding and review.
The next phase
Next month, we are launching MedAscend's new AI engine, built on this work and designed to improve the reliability of patient responses, assessment and feedback across both voice and video simulation.
The focus is not novelty for its own sake. It is delivering a solution institutions can actually adopt: consistency, disclosure control, rubric alignment, feedback specificity, modality choice and educator oversight, without pretending that one format, one scenario type or one assessment model fits every context.
Learners do not just need an AI that sounds human. They need feedback they can trust. They need responses that remain consistent across different questioning styles. They need assessment that reflects the standards used by real educators.
The standard
The future of AI simulation will not be defined by who adds AI fastest. It will be defined by who can build systems that are reliable, fair, governed, educationally aligned and capable of improving learning without widening inequity.
Not
Can this AI patient talk?
But
Can this system support the learning objective, produce defensible feedback, protect learners, reduce bias, give educators control and generate evidence over time?
That is the standard institutions should expect. It is also the standard companies in this space should hold themselves to.
References
Sources cited in this article
Show all 5 references
- 1.Kononowicz AA, Woodham LA, Edelbring S, Stathakarou N, Davies D, Saxena N, et al. Virtual patient simulations in health professions education: systematic review and meta-analysis by the Digital Health Education Collaboration. Journal of Medical Internet Research. 2019;21(7):e14676.
- 2.NICE. Evidence standards framework for digital health technologies.
- 3.Hamstra SJ, Brydges R, Hatala R, Zendejas B, Cook DA. Reconsidering fidelity in simulation-based training. Academic Medicine. 2014;89(3):387–392.
- 4.World Health Organization. Ethics and governance of artificial intelligence for health: WHO guidance. 2021.
- 5.Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447–453.




