Simulated reasoning AI models bluff their way through queries

? Simulated reasoning in large language models seems to be falling short of human reasoning. Amidst the long-standing fear that AI will someday replace humans, a paper (pdf) recently published by Arizona State University’s Data Mining and Machine Learning Lab questions if it’s even close to thinking like one.

“A brittle mirage.” Researchers from the study tested LLMs’ chain-of-thought simulated reasoning under controlled conditions, measuring their competence when faced with logical problems that deviate from training patterns, according to Ars Technica. It turns out that AI is nowhere near perfect, failing to adjust to simple tasks. While the LLM’s chain-of-thought reasoning attempts to emulate human thought processes, it ultimately reproduces these thinking patterns based on previous human efforts that it is trained through. It lacks the inherent ability to reason or understand, thus posing as a “mirage” with serious limitations.

Going around in circles. To understand these limitations, the researchers tested the LLMs for reasoning capabilities with different text transformations, some of which aligned with function patterns from their training data, and some of which were “out of domain.” As anticipated, the models flatlined when encountering queries beyond their trained function patterns. They only knew to scour through their pre-determined function patterns, coming up with “correct reasoning paths, yet incorrect answers.” Bigger discrepancies between the text tasks and the model’s trained templates meant bigger performance failures. We can’t expect to be dependent on AI in more complicated domains when it only has “fluent nonsense” to vouch for it.

After all, AI is man made. When LLMs effectively fail in dealing with logical tasks, supervised fine-tuning is always a recourse, but it essentially relies on manual human judgement. Even so, the ultimate goal is for LLMs to fundamentally grasp logical reasoning, not improvise their way through queries with simulated pattern replications. In short: always fact-check, even if your AI model of choice sounds particularly confident.