Friday, June 21, 2024

"Apple, Microsoft Shrink AI Models to Improve Them"

 The smaller LLMs are also a lot cheaper as far as chip capex goes.

From IEEE Spectrum, June 20:

“Small language models” emerge as an alternative to gargantuan AI options

Tech companies have been caught up in a race to build the biggest large language models (LLMs). In April, for example, Meta announced the 400-billion-parameter Llama 3, which contains twice the number of parameters—or variables that determine how the model responds to queries—than OpenAI’s original ChatGPT model from 2022. Although not confirmed, GPT-4 is estimated to have about 1.8 trillion parameters.

In the last few months, however, some of the largest tech companies, including Apple and Microsoft, have introduced small language models (SLMs). These models are a fraction of the size of their LLM counterparts and yet, on many benchmarks, can match or even outperform them in text generation.

On 10 June, at Apple’s Worldwide Developers Conference, the company announced its “Apple Intelligence” models, which have around 3 billion parameters. And in late April, Microsoft released its Phi-3 family of SLMs, featuring models housing between 3.8 billion and 14 billion parameters.

OpenAI’s CEO Sam Altman believes we’re at the end of the era of giant models.

In a series of tests, the smallest of Microsoft’s models, Phi-3-mini, rivalled OpenAI’s GPT-3.5 (175 billion parameters), which powers the free version of ChatGPT, and outperformed Google’s Gemma (7 billion parameters). The tests evaluated how well a model understands language by prompting it with questions about mathematics, philosophy, law, and more. What’s more interesting, Microsoft’s Phi-3-small, with 7 billion parameters, fared remarkably better than GPT-3.5 in many of these benchmarks.

Aaron Mueller, who researches language models at Northeastern University in Boston, isn’t surprised SLMs can go toe-to-toe with LLMs in select functions. He says that’s because scaling the number of parameters isn’t the only way to improve a model’s performance: Training it on higher-quality data can yield similar results too....

....MUCH MORE