Large Language Models (LLMs) — such as those used in chatbots — have an alarming tendency to hallucinate. That is, to generate false content that they present as accurate. These AI hallucinations pose, among other risks, a direct threat to science and scientific truth, researchers at the Oxford Internet Institute warn.
According to their paper, published in Nature Human Behaviour, “LLMs are designed to produce helpful and convincing responses without any overriding guarantees regarding their accuracy or alignment with fact.”
LLMs are currently treated as knowledge sources and generate information in response to questions or prompts. But the data they’re trained on isn’t necessarily factually correct. One reason behind this is that these models often use online sources, which can contain false statements, opinions, and inaccurate information.
“People using LLMs often anthropomorphise the technology, where they trust it as a human-like information source,” explained Professor Brent Mittelstadt, co-author of the paper.
“This is, in part, due to the design of LLMs as helpful, human-sounding agents that converse with users and answer seemingly any question with confident sounding, well-written text. The result of this is that users can easily be convinced that responses are accurate even when they have no basis in fact or present a biased or partial version of the truth.”
When it comes to science and education, information accuracy is of vital importance and the researchers urge the scientific community to use LLMs as “zero-shot translators.” This means that users should provide the model with the appropriate data and ask to transform it into a conclusion or code, for instance — instead of relying on the model itself as a source of knowledge.
This way it becomes easier to check that the output is factually correct and in line with the provided input.
LLMs will “undoubtedly” assist with scientific workflows, according to the Oxford professors. But it’s crucial for the community to use them responsibly and maintain clear expectations on how they can actually contribute.