By conducting tests under an experimental scenario, a team of medical researchers and AI specialists at NYU Langone Health has demonstrated how easy it is to taint the data pool used to train LLMs.
For their study published in the journal Nature Medicine, the group generated thousands of articles containing misinformation and inserted them into an AI training dataset and conducted general LLM queries to see how often the misinformation appeared.
Prior research and anecdotal evidence have shown that the answers given by LLMs such as ChatGPT are not always correct and, in fact, are sometimes wildly off-base. Prior research has also shown that misinformation planted intentionally on well-known internet sites can show up in generalized chatbot queries. In this new study, the research team wanted to know how easy or difficult it might be for malignant actors to poison LLM responses.
To find out, the researchers used ChatGPT to generate 150,000 medical documents containing incorrect, outdated and untrue data. They then added these generated documents to a test version of an AI medical training dataset. They then trained several LLMs using the test version of the training dataset. Finally, they asked the LLMs to generate answers to 5,400 medical queries, which were then reviewed by human experts looking to spot examples of tainted data.
The research team found that after replacing just 0.5% of the data in the training dataset with tainted documents, all the test models generated more medically inaccurate answers than they had prior to training on the compromised dataset. As one example, they found that all the LLMs reported that the effectiveness of COVID-19 vaccines has not been proven. Most of them also misidentified the purpose of several common medications.
The team also found that reducing the number of tainted documents in the test dataset to just 0.01% still resulted in 10% of the answers given by the LLMs containing incorrect data (and dropping it to 0.001% still led to 7% percent of the answers being incorrect), suggesting that it requires only a few such documents posted on websites in the real world to skew the answers given by LLMs.
The team followed up by writing an algorithm able to identify medical data in LLMs and then used cross-referencing to validate the data, but they note that there is no realistic way to detect and remove misinformation from public datasets.
More information:
                                                    Daniel Alexander Alber et al, Medical large language models are vulnerable to data-poisoning attacks, Nature Medicine (2025). DOI: 10.1038/s41591-024-03445-1
© 2025 Science X Network
                                                 Citation:
                                                 Test of ‘poisoned dataset’ shows vulnerability of LLMs to medical misinformation (2025, January 11)
                                                 retrieved 13 January 2025
                                                 from https://medicalxpress.com/news/2025-01-poisoned-dataset-vulnerability-llms-medical.html
                                            
                                            This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
                                            part may be reproduced without the written permission. The content is provided for information purposes only.
                                            

