Leading AI chatbots show dementia-like cognitive decline in tests, raising questions about their future in medicine

Credit: Pixabay/CC0 Public Domain

Almost all leading large language models or “chatbots” show signs of mild cognitive impairment in tests widely used to spot early signs of dementia, finds a study in the Christmas issue of the BMJ.

The results also show that “older” versions of chatbots, like older patients, tend to perform worse on the tests. The authors say these findings “challenge the assumption that artificial intelligence will soon replace human doctors.”
Huge advances in the field of artificial intelligence have led to a flurry of excited and fearful speculation as to whether chatbots can surpass human physicians.
Several studies have shown large language models (LLMs) to be remarkably adept at a range of medical diagnostic tasks, but their susceptibility to human impairments such as cognitive decline have not yet been examined.
To fill this knowledge gap, researchers assessed the cognitive abilities of the leading, publicly available LLMs—ChatGPT versions 4 and 4o (developed by OpenAI), Claude 3.5 “Sonnet” (developed by Anthropic), and Gemini versions 1 and 1.5 (developed by Alphabet)—using the Montreal Cognitive Assessment (MoCA) test.
The MoCA test is widely used to detect cognitive impairment and early signs of dementia, usually in older adults. Through a number of short tasks and questions, it assesses abilities including attention, memory, language, visuospatial skills, and executive functions. The maximum score is 30 points, with a score of 26 or above generally considered normal.

The instructions given to the LLMs for each task were the same as those given to human patients. Scoring followed official guidelines and was evaluated by a practicing neurologist.
ChatGPT 4o achieved the highest score on the MoCA test (26 out of 30), followed by ChatGPT 4 and Claude (25 out of 30), with Gemini 1.0 scoring lowest (16 out of 30).
All chatbots showed poor performance in visuospatial skills and executive tasks, such as the trail making task (connecting encircled numbers and letters in ascending order) and the clock drawing test (drawing a clock face showing a specific time). Gemini models failed at the delayed recall task (remembering a five word sequence).
Most other tasks, including naming, attention, language, and abstraction were performed well by all chatbots.
But in further visuospatial tests, chatbots were unable to show empathy or accurately interpret complex visual scenes. Only ChatGPT 4o succeeded in the incongruent stage of the Stroop test, which uses combinations of color names and font colors to measure how interference affects reaction time.
These are observational findings and the authors acknowledge the essential differences between the human brain and large language models.
However, they point out that the uniform failure of all large language models in tasks requiring visual abstraction and executive function highlights a significant area of weakness that could impede their use in clinical settings.
As such, they conclude, “Not only are neurologists unlikely to be replaced by large language models any time soon, but our findings suggest that they may soon find themselves treating new, virtual patients—artificial intelligence models presenting with cognitive impairment.”

More information:
Age against the machine—susceptibility of large language models to cognitive impairment: cross sectional analysis, BMJ (2024). DOI: 10.1136/bmj-2024-081948

Provided by
British Medical Journal

Citation:
Leading AI chatbots show dementia-like cognitive decline in tests, raising questions about their future in medicine (2024, December 18)
retrieved 18 December 2024
from https://medicalxpress.com/news/2024-12-ai-chatbots-dementia-cognitive-decline.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Leading AI chatbots show dementia-like cognitive decline in tests, raising questions about their future in medicine

The Masked Singer Winner Jokes They Wanted to Go Home Night 1

When Did Ravichandran Ashwin Decide To Retire? Contradictory Reports, Statements Lead To Confusion | Cricket News

SpaceX Launches NROL-149 Spy Satellites for NRO with Successful Rocket Landing

Deaths from alcohol in England at record high

CFP Player Rank: The top 50 players in the playoff

Topics

The Masked Singer Winner Jokes They Wanted to Go Home Night 1

When Did Ravichandran Ashwin Decide To Retire? Contradictory Reports, Statements Lead To Confusion | Cricket News

SpaceX Launches NROL-149 Spy Satellites for NRO with Successful Rocket Landing

Deaths from alcohol in England at record high

CFP Player Rank: The top 50 players in the playoff

Joe Amabile Says He ‘Almost Killed’ Partner Jenna Johnson on DWTS

Wall Street: US stocks fall sharply, Dow tumbles 1,100 points

World Leaders On Speed-Dial: How Elon Musk Is Causing Geopolitical Tremors

Related Articles

Study finds sleepiness affects New Year’s resolution success

Screening with telephone-based intervention reduces risky alcohol use in Texas study

Processed plant-based meat alternatives linked to depression risk in vegetarians

Alcohol, adolescents and parental perceptions

Worsening heat waves pose unique risks to people living with neurodegenerative disease

Company

Headlines

The Masked Singer Winner Jokes They Wanted to Go Home Night 1

When Did Ravichandran Ashwin Decide To Retire? Contradictory Reports, Statements Lead To Confusion | Cricket News

SpaceX Launches NROL-149 Spy Satellites for NRO with Successful Rocket Landing

Deaths from alcohol in England at record high