EXO Labs has penned a detailed weblog publish about running Llama on Windows 98 and demonstrated a somewhat highly effective AI massive language model (LLM) working on a 26-year-old Windows 98 Pentium II PC in a transient video on social media. The video exhibits an historic Elonex Pentium II @ 350 MHz booting into Windows 98, and then EXO then fires up its customized pure C inference engine primarily based on Andrej Karpathy’s Llama2.c and asks the LLM to generate a story about Sleepy Joe. Amazingly, it really works, with the story being generated at a very respectable tempo.
LLM working on Windows 98 PC26 12 months outdated {hardware} with Intel Pentium II CPU and 128MB RAM.Makes use of llama98.c, our customized pure C inference engine primarily based on @karpathy llama2.cCode and DIY information 👇 pic.twitter.com/pktC8hhvvaDecember 28, 2024
The above eye-opening feat is nowhere close to the top sport for EXO Labs. This considerably mysterious group got here out of stealth in September with a mission “to democratize entry to AI.” A staff of researchers and engineers from Oxford College shaped the group. Briefly, EXO sees a handful of megacorps controlling AI as a very dangerous factor for tradition, fact, and different elementary features of our society. Thus EXO hopes to “Construct open infrastructure to coach frontier fashions and allow any human to run them anyplace.” On this manner, extraordinary people can hope to coach and run AI fashions on nearly any gadget – and this loopy Windows 98 AI feat is a totemic demo of what could be carried out with (severely) restricted assets.
For the reason that Tweet video is somewhat transient, we have been grateful to seek out EXO’s weblog publish about Running Llama on Windows 98. This publish is printed as Day 4 of “the 12 days of EXO” collection (so keep tuned).
As readers would possibly anticipate, it was trivial for EXO to select up an outdated Windows 98 PC from eBay as the muse of this undertaking, however there have been many hurdles to beat. EXO explains that getting knowledge onto the outdated Elonex branded Pentium II was a problem, making them resort to utilizing “good outdated FTP” for file transfers through the traditional machine’s Ethernet port.
Compiling trendy code for Windows 98 was most likely a better problem. EXO was glad to seek out Andrej Karpathy’s llama2.c, which could be summarized as “700 strains of pure C that may run inference on fashions with Llama 2 structure.” With this useful resource and the outdated Borland C++ 5.02 IDE and compiler (plus a few minor tweaks), the code could possibly be made into a Windows 98-compatible executable and run. Here is a GitHub link to the completed code.
35.9 tok/sec on Windows 98 🤯That is a 260K LLM with Llama-architecture.We additionally tried out bigger fashions. Outcomes in the weblog publish. https://t.co/QsViEQLqS9 pic.twitter.com/lRpIjERtSrDecember 28, 2024
One of the high quality of us behind EXO, Alex Cheema, made a level of thanking Andrej Karpathy for his code, marveling at its efficiency, delivering “35.9 tok/sec on Windows 98” utilizing a 260K LLM with Llama structure. It’s most likely value highlighting that Karpathy was beforehand a director of AI at Tesla and was on the founding staff at OpenAI.
In fact, a 260K LLM is on the small facet, however this ran at a first rate tempo on an historic 350 MHz single-core PC. In line with the EXO weblog, Transferring as much as a 15M LLM resulted in a technology pace of a little over 1 tok/sec. Llama 3.2 1B was glacially sluggish at 0.0093 tok/sec, nonetheless.
BitNet is the larger plan
By now, you may be nicely conscious that this story is not nearly getting an LLM to run on a Windows 98 machine. EXO rounds out its weblog publish by speaking in regards to the future, which it hopes will likely be democratized due to BitNet.
“BitNet is a transformer structure that makes use of ternary weights,” it explains. Importantly, utilizing this structure, a 7B parameter model solely wants 1.38GB of storage. That will nonetheless make a 26-year-old Pentium II creak, however that is feather-light to trendy {hardware} and even for decade-old units.
EXO additionally highlights that BitNet is CPU-first – swerving costly GPU necessities. Furthermore, this sort of model is claimed to be 50% extra environment friendly than full-precision fashions and can leverage a 100B parameter model on a single CPU at human studying speeds (about 5 to 7 tok/sec).
Earlier than we go, please be aware that EXO continues to be on the lookout for assist. When you additionally need to keep away from the longer term of AI being locked into large knowledge facilities owned by billionaires and megacorps and assume you may contribute in a way, you might attain out.
For a extra informal liaison with EXO Labs, they host a Discord Retro channel to debate working LLMs on outdated {hardware} like outdated Macs, Gameboys, Raspberry Pis, and extra.