A Japanese puzzle has defeated artificial intelligence. The machine is completely unable to solve it.

Number puzzles have been a pastime known for millennia – they first appeared in ancient China and began appearing in newspapers in the late 19th century. About 20 years ago, Sudoku gained global popularity, a puzzle first published in 1986 in the Japanese magazine "Nicoli." Today, the game has millions of fans worldwide, and various versions of the mobile app alone have been downloaded by approximately 200 million users.
Sudoku involves filling in empty squares on a 9x9 square grid with numbers. Each row, column, and 3x3 square (the so-called number block) into which the grid is divided must contain one digit from 1 to 9, and no digits can be repeated. Mathematicians from the University of Sheffield (UK) proved in 2005 that there are approximately 6 quadrillion possible valid Sudoku grids (6 x 10 to the 21st power). Other versions of the game also exist—for example, a 6x6 grid must be filled with the digits 1 to 6.
Now, it turns out that Sudoku poses a challenge for artificial intelligence. Although AI is making enormous progress in, among other things, analyzing large data sets, generating text, images, and videos, and translating, logical tasks are its weak point. This was confirmed by researchers from the University of Colorado at Boulder (USA) , whose article on the subject appeared in the "ACL Anthology," a collection of over 110,000 papers collected by the Association for Computational Linguistics (ACL).
As the lead author of the paper, computer science and machine learning expert Anirudh Maiya, said, solving Sudoku has several important elements.
"You have to proceed step by step, constantly re-evaluate the number fields, and consistently follow the rules. Puzzles like these are fun, but they also provide an ideal microcosm for studying the decision-making process in machine learning," the expert explained.
For the study, Maiya and his team created 2,300 Sudoku puzzles of varying difficulty in a 6x6 grid. The researchers then assigned them to several large language models (LLMs), including o1, Llama-3.1, Gemma-2, and Mistral, to solve them.
The experiment showed that the task was too difficult for all AI models —they only managed to solve a total of 0.4% of the puzzles. Researchers attribute this to the fact that AI doesn't think logically, but rather determines solutions based on probability. Therefore, rule-based and reasoning-based tasks are difficult for it.
- Artificial intelligence models have difficulty taking into account all the limiting factors in a number grid simultaneously - explained the authors of the paper.
Among the LLM subjects, o1 performed best, solving approximately 65% of the Sudoku puzzles. However, as the difficulty of the puzzles increased, his success rate also dropped.
Even more problems arose when the researchers asked the AI to explain how it arrived at the solution to the puzzle. Of all the models tested, only 5% of the time were able to correctly justify the input of specific numbers. Often, the answers were incorrect or unclear.
- For example, the AI said: there can't be a two here because there is already a two in this row, which was not true - said co-author of the study Dr. Ashutosh Trivedi.
He added that in some situations, the AI ignored the number combinations on the board or came up with absurd explanations. In one such case, during a conversation about Sudoku, one of the models suddenly gave a weather forecast.
“The AI was completely confused and reacted in a bizarre way,” said Dr. Trivedi.
According to the authors, the study results show that despite the impressive achievements of artificial intelligence, it cannot be fully relied on, especially in tasks requiring precise reasoning.
"Many people are talking about AI models developing new abilities that you wouldn't expect them to have. However, it's not surprising that they still perform poorly in many tasks," concluded Anirudh Maiya.
well.pl