Friday, September 13, 2024

"First impressions of ChatGPT o1: An AI designed to overthink it"

Not even to the second of five levels.*

From TechCrunch, September 13: 

OpenAI released its new o1 models on Thursday, giving ChatGPT users their first chance to try AI models that pause to “think” before they answer. There’s been a lot of hype building up to these models, codenamed “strawberry” inside OpenAI. But does strawberry live up to the hype?

Sort of.

Compared to GPT-4o, the o1 models feel like one step forward and two steps back. ChatGPT o1 excels at reasoning and answering complex questions, but the model is roughly four times more expensive to use than GPT-4o. OpenAI’s latest model lacks the tools, multimodal capabilities, and speed that made GPT-4o so impressive. In fact, OpenAI even admits that “GPT-4o is still the best option for most prompts,” on its help page, and notes elsewhere that GPT o1 struggles at simpler tasks.

“It’s impressive, but I think the improvement is not very significant,” said Ravid Shwartz Ziv, an NYU professor who studies AI models. “It’s better at certain problems, but you don’t have this across-the-board improvement.”

For all of these reasons, it’s important to use GPT o1 only for the questions it’s truly designed to help with: big ones. To be clear, most people are not using generative AI to answer these kinds of questions today, largely because today’s AI models are not very good at it. However, o1 is a tentative step in that direction....

Following on September 8's "ChatBots Are Not The Be-All And End-All Of Artificial Intelligence":

Far from it.
And all the focus on ChatBots and LLMs are more than just a distraction, they are a perverse representation of what AI is doing and will do and could potentially cost you money or opportunity or both....

For reference here is the roadmap OpenAI is pitching, via Bloomberg, July 11:

OpenAI Scale Ranks Progress Toward ‘Human-Level’ Problem Solving

....MUCH MORE

And from Decrypt, September 3:
What we know about the secretive AI projects pushing the limits of what OpenAI can do.....