Checkmate? AI's pawn-pushing prowess proves partly pitiful, partly promising

  • 📰 TheRegister
  • ⏱ Reading Time:
  • 73 sec. here
  • 3 min. at publisher
  • 📊 Quality Score:
  • News: 33%
  • Publisher: 61%

Ai Ai Headlines News

Ai Ai Latest News,Ai Ai Headlines

GPT-4o is far better than other models, but still made illegal moves 13% of the time

A new benchmark for large language models shows that even the latest models aren't the best chess players.tests LLMs by giving them 1,000 chess puzzles to complete. In contrast to a normal game of chess, puzzles are essentially logic problems where the state of the chess board is set up in a specific way. The goal of the chess puzzle is to play the best move or chain of moves to achieve the quickest possible and unstoppable checkmate.

The benchmark's GitHub shows performance data for many of the most popular LLMs today from OpenAI , Anthropic, and Mistral. Most models achieved dismal Elo ratings, a number that represents skill level. Most LLMs landed in the 100 to 500 range, which is firmly the domain of players who have very little experience with chess. These included Claude 3 variants, GPT-3.5 Turbo, and Mistral models.

In 501 of the 1,000 puzzles, GPT-4o was able to find the best move. For instance, in this puzzle white's best move is rook to c8, right next to black's queen. However, the queen can't just take the rook for free as the rook is in the line of sight of white's light square bishop. But black can't move the queen out of the way because then its king would be checkmated, so black must concede the loss of its queen.

"The only conclusion is that this failure is due to the lack of historical records of played games in the training data," he said."This implies that it cannot be argued that these models are able to 'reason' in any sense of the word, but merely output a variation of what they have seen during training."

"Even chess moves are nothing but a series of tokens, like 'e' and '4', and have no grounding in reality," Prelovac said."They are products of statistical analysis of the training data, upon which the next token is predicted."

 

Thank you for your comment. Your comment will be published after being reviewed.
Please try again later.
We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

 /  🏆 67. in ERROR

Ai Ai Latest News, Ai Ai Headlines