People use AI, like LLMs, for anything from basic every-day tasks to important, impactful, complex tasks which immediately impact human lives.
What often gets lost in conversations with LLMs, though, is fundamental facts, semantics, details and expertise. Further, AIs exhibit strong training data bias and fail to introspect about this bias.
LLM systems use pattern recognition, not true understanding. AI models don't "understand" concepts in the way that humans do. Responses are generated by predicting patterns in natural language, not by applying formal logic or mathematical rigor, even when asked to do just that.
LLMs mimic expertise, but lack grounded reasoning entirely. They are easily distracted by surface-level details, and happily chase red herrings through an entire conversation, sometimes missing the point entirely.
For example, when asked if
1obj.Groups.Any(group => session.User.HasGroup(group));
is equivalent to
1session.User.HasAnyGroupsOf(obj.Groups);
assuming that the implementation of "HasAnyGroupsOf" is trivial.
A popular LLM (deepseek) correctly answers that they are equivalent. However, when queried whether it noticed the difference of "obj" vs "session.User", it backpedals, because this is a common pattern in its training data. The right answer is "Yes, I noticed, and I'm right". Getting a question wrong often leads to a mentor- or teacher-figure asking an inquisitive question, like "did you take into account xyz", to which the right answer is almost always "oh, you're right, I forgot", and rarely "yes, of course, and I'm still right". The AI was entirely correct, and it could mathematically prove it by showing O∩U=U∩O, but instead, it backpedals and says that the two code snippets are not, after all, equivalent.
When asked to explain, with set theory, why they are not equivalent, it once again explains that they are equivalent.
Later, when I inquired why it got that wrong momentarily, it answered:
[...] the confusion stemmed from: [...] Failing to immediately recognize that set intersection is commutative.
When asked again, or asked slightly differently, the outcome might vary wildly. However, this example shows a selective failure in showing a grounded, causal model of the world. To anyone who uses LLMs frequently and verifies their output, it's obvious that LLMs optimize for plausibility, not for correctness.
This shows us that LLMs are guaranteed to make medical misdiagnoses, legal errors, and their excessive usage will lead to disastrous mistakes if left unchecked, for example when writing critical code. Remember; this popular LLM failed to recognize that set intersection is commutative--equivalent to failing to recognize that 2>1.
Future advancements in AI are promising, such as hybrid systems, enforcing guardrails like human-in-the-loop, rule-based constraints, and so on. These advancements are not part of the conversation, because the systems in use right now, today, do not have these guardrails and improvements. When an LLM makes a critical mistake, such as fundamentally missing that O∩U=U∩O, it can lead to serious harm today.
What I'm showing here is not a "bug" in a single model, and isn't something that will "surely go away" by throwing more money at it. Putting my example into $YOUR_FAVORITE_AI and showing that it can do better is not, and will never be, a counter-argument.
This is a symptom of a deeper challenge in building systems that can fool us into thinking they can reason. There will always be an example that can tickle this fundamental issue out of a system, and get it to confidently say something that is entirely incorrect. We just have to hope that the next example of this won't cost us lives.
Please, for the love of god, use your brains, and do not let AI output go unchecked.