For these of us not fairly prepared to bow down earlier than our new AI overlords (to say nothing of the Idiocracy we’ll all be saddled with for at the very least the following 4 years), here’s a considerably encouraging story from Live Science about how even probably the most superior AI fashions have utterly flunked out relating to fixing probably the most advanced issues within the rarified realm of upper arithmetic:
Mathematicians have stumped probably the most superior generative artificial intelligence (AI) fashions with a collection of mind-bending new math issues.
These issues sometimes require doctorate-level mathematicians hours to days to resolve, in response to the analysis institute Epoch AI. However within the new exams, probably the most superior AI fashions available on the market acquired right solutions on lower than 2% of those issues.…
Most of those benchmarks are geared towards testing AI’s capacity to do high-school and college-level math, Elliot Glazer, a mathematician at Epoch AI, and colleagues wrote in a brand new paper posted on the preprint database arXiv. (The paper has not but been peer-reviewed or printed in a scientific journal.)
…
The issues had been additionally distinctive — a step taken to make sure that not one of the issues had been already within the AI fashions’ coaching knowledge. When advanced reasoning issues are included within the coaching knowledge, the AI might seem to resolve the issues, however in actuality, it already has a “cheat sheet,” because it has been skilled on the solutions.
…
“[E]ven when a mannequin obtained the right reply, this doesn’t imply that its reasoning was right,” the paper authors wrote. “For example, on one among these issues working a couple of easy simulations was enough to make correct guesses with none deeper mathematical understanding. Nevertheless, fashions’ low general accuracy exhibits that such guessing methods don’t work on the overwhelming majority of FrontierMath issues.”
The findings present that proper now, AI fashions do not possess research-level math reasoning, Epoch AI’s collaborators concluded. Nevertheless, as AI fashions advance, these benchmark exams will present a technique to discover out if their reasoning talents are deepening.
Who would have thought that higher-reasoning math expertise would nonetheless be the obvious Achilles’ Heel of superior AI fashions at this level? I’m wondering if Elon is paying any consideration?