From IEEE Spectrum, May 30:
“Imprecise” language models are smaller, speedier—and nearly as accurate
Large language models, the AI systems that power chatbots like ChatGPT, are getting better and better—but they’re also getting bigger and bigger, demanding more energy and computational power. For LLMs that are cheap, fast, and environmentally friendly, they’ll need to shrink, ideally small enough to run directly on devices like cell phones. Researchers are finding ways to do just that by drastically rounding off the many high-precision numbers that store their memories to equal just 1 or -1.
LLMs, like all neural networks, are trained by altering the strengths of connections between their artificial neurons. These strengths are stored as mathematical parameters. Researchers have long compressed networks by reducing the precision of these parameters—a process called quantization—so that instead of taking up 16 bits each, they might take up 8 or 4. Now researchers are pushing the envelope to a single bit.
How to make a 1-bit LLM....
....MUCH MORE
Also at IEEE Spectrum, May 26: "
And recently on reducing the electricity demands of artificial intelligence, May 21's "As The Amount Of Electricity Required By Data Centers Heads Toward Half Of Current Generating Capacity...."