Article
Published: 17 June 2026
Valentin Liévin
ORCID: orcid.org/0000-0002-6096-95641 na2,
Anil Palepu
ORCID: orcid.org/0000-0002-4720-87872 na2,
Wei-Hung Weng
ORCID: orcid.org/0000-0003-2232-03902,
Khaled Saab
ORCID: orcid.org/0000-0003-1427-04691,
David Stutz1,
Yong Cheng1,
Kavita Kulkarni2,
S. Sara Mahdavi1,
Joëlle Barral1,
Dale R. Webster
ORCID: orcid.org/0000-0002-3023-88242,
Katherine Chou2,
Avinatan Hassidim2,
Yossi Matias
ORCID: orcid.org/0000-0003-3960-60022,
James Manyika2,
Ryutaro Tanno
ORCID: orcid.org/0000-0002-8107-67301,
Vivek Natarajan
ORCID: orcid.org/0000-0001-7849-20742,
Adam Rodman2,
Tao Tu
ORCID: orcid.org/0000-0001-9191-79381,
Alan Karthikesalingam
ORCID: orcid.org/0009-0000-4958-59762 na1 &
…
Mike Schaekermann
ORCID: orcid.org/0000-0002-1735-96802 na1
Nature
(2026) Cite this article
We are providing an unedited version of this manuscript to give early access to its
findings. Before final publication, the manuscript will undergo further editing. Please note
there may be errors present which affect the content, and all legal disclaimers apply.
Abstract
While large language models (LLMs) have shown promise in diagnostic dialogue1, their capabilities for effective management reasoning—including disease progression, therapeutic response, and safe medication prescription—remain under-explored. We advance the previously demonstrated diagnostic capabilities of the Articulate Medical Intelligence Explorer (AMIE)1−3 through a new LLM-based agentic system optimized for multi-visit clinical management and dialogue. To ground its reasoning in authoritative clinical knowledge, AMIE leverages Gemini’s long-context capabilities4, combining in-context retrieval with structured reasoning to align its output with up-to-date clinical practice guidelines and drug formularies. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) study, AMIE was compared to 21 primary care physicians (PCPs) across 100 multi-visit case scenarios designed to reflect UK NICE Guidance and BMJ Best Practice guidelines. AMIE was non-inferior to PCPs in management reasoning as assessed by specialists and scored better in both preciseness of treatments and investigations, and in its alignment with and grounding in clinical guidelines. To benchmark medication reasoning, we developed RxQA, a multiple-choice question benchmark derived from two national drug formularies (US, UK) and validated by board-certified pharmacists. Though AMIE and PCPs both benefited from the ability to access external drug information, AMIE outperformed PCPs on higher difficulty questions. While further research would be needed before real-world translation, AMIE’s strong performance across evaluations marks a significant step towards conversational AI as a tool in disease management.
You have full access to this article via your institution.
Similar content being viewed by others
Author information
Author notes
These authors jointly supervised this work: Alan Karthikesalingam, Mike Schaekermann
These authors contributed equally: Valentin Liévin, Anil Palepu
Authors and Affiliations
Google DeepMind, Mountain View, California, USA
Valentin Liévin, Khaled Saab, David Stutz, Yong Cheng, S. Sara Mahdavi, Joëlle Barral, Ryutaro Tanno & Tao Tu
Google Research, Mountain View, California, USA
Anil Palepu, Wei-Hung Weng, Kavita Kulkarni, Dale R. Webster, Katherine Chou, Avinatan Hassidim, Yossi Matias, James Manyika, Vivek Natarajan, Adam Rodman, Alan Karthikesalingam & Mike Schaekermann
Authors
Valentin Liévin
Anil Palepu
Wei-Hung Weng
Khaled Saab
David Stutz
Yong Cheng
Kavita Kulkarni
S. Sara Mahdavi
Joëlle Barral
Dale R. Webster
Katherine Chou
Avinatan Hassidim
Yossi Matias
James Manyika
Ryutaro Tanno
Vivek Natarajan
Adam Rodman
Tao Tu
Alan Karthikesalingam
Mike Schaekermann
Corresponding authors
Correspondence to
Valentin Liévin, Anil Palepu, Alan Karthikesalingam or Mike Schaekermann.
Supplementary information
Supplementary Information (download PDF )
Supplementary discussion, methods and results (Sections 1-16). Contains related work, details on the system design for the Mx agent and Dialogue agent, details on the OSCE evaluation study (inter-rater reliability analysis, clinician metadata, scenario metadata, ablation analysis), and methods details and further results for the RxQA medication reasoning benchmark.
Reporting Summary (download PDF )
Supplementary Data 1 (download PDF )
Detailed view of two sample scenarios with AMIE and PCP output and evaluation gradings. Full details for two sample scenarios used in the OSCE evaluation study, including scenario information, AMIE-patient-actor conversations, PCP-patient-actor conversations, specialist physician gradings and patient actor gradings for all three visits per scenario.
Supplementary Data 2 (download PDF )
Details for all 120 OSCE scenarios with AMIE output (PDF). Scenario details and AMIE output for all 120 scenarios used either in the OSCE evaluation study (100) or for validation purposes (20), in human-readable PDF format.
Supplementary Data 3 (download CSV )
Details for all 120 OSCE scenarios with AMIE output (CSV). Scenario details and AMIE output for all 120 scenarios used either in the OSCE evaluation study (100) or for validation purposes (20), in machine-readable CSV format.
Peer Review File (download PDF )
About this article
Cite this article
Liévin, V., Palepu, A., Weng, WH. et al. Towards Conversational AI for Disease Management.
Nature (2026). https://doi.org/10.1038/s41586-026-10764-5
Download citation
Received: 17 March 2025
Accepted: 04 June 2026
Published: 17 June 2026
DOI: https://doi.org/10.1038/s41586-026-10764-5
You have full access to this article via your institution.
View original source — Nature ↗

