Thursday, November 23, 2023

More On Q* and Q-Learning

We left the post immediately below, "OpenAI Q*—Credit Where Credit Is Due: The First Article We Saw Hinting That Sam Altman Thought He Was Building God" with a quick backgrounder on Q* and Q-Learning.

Here's much more from PC Guide, November 23:

What are Q-Learning and Q*? – OpenAI’s secret AI models
What the Mira Murati letter reveals

On Wednesday, November 22nd, OpenAI CTO Mira Murati sent a letter to employees. The letter detailed a project known internally as Q* (Pronounced Q-Star) or Q-Learning. This project was purported to be “one factor among a longer list of grievances by the board leading to Altman’s firing”, and could help accelerate the learning rate of mathematical models towards AGI (Artificial General Intelligence). So, how does Q-Learning work, and what controversy (reportedly) led to the firing of OpenAI CEO Sam Altman?

OpenAI CTO Mira Murati and the internal letter to staff

Q* and Q-Learning are trending today due to references made by OpenAI’s Chief Technology Officer, Mira Murati, on Wednesday, November 22nd. It’s expected that this technology could be an ingredient for achieving AGI, or Artificial General Intelligence. As a result, a “lack of consistently candid communication” about such a world-changing development played a part in the board’s decision to fire OpenAI CEO Sam Altman, according to an internal letter sent out by Murati to OpenAI employees.

✓ Steve says

Was the writing on the wall?

There are plenty of conspiracy theories still circling as to why the OpenAI board fired Sam Altman as CEO. To date, this seems to be the most probable reason, but is still unconfirmed. It appears to fit the reasoning initially given by the board that Altman was “not consistently candid in his communications with the board”, about just how much progress was made recently with the Q-Learning algorithm.

In fact, Altman said on stage, on November 17th, that “4 times now in the history of OpenAI – the most recent time was just in the last couple of weeks – I’ve gotten to be in the room when we push the veil of ignorance back and the frontier of discovery forward”. It’s possible that this 4th breakthrough was none other than project Q*. Then again, it’s also possible this was in reference to GPT-4 Turbo or ChatGPT’s new voice capabilities.

What are project Q* and its Q-Learning algorithm?

To date, Q* and Q-Learning are being used synonymously. With very little documentation and few official references to these terms, we’re unable to definitively differentiate them. However, it’s possible that Q* is an internal project name, in reference to the optimal solution of a Bellman equation (which we’ll return to later). Q* may also be the name of a corresponding AI model yet to be announced by OpenAI, or at least a working title thereof. By contrast, Q-Learning is a mathematical concept. The Q-Learning algorithm will be a formula used in this project and AI model.

Names aside, Q-Learning refers to a formula used in a machine learning algorithm capable of “grade-school” level mathematics and is hoped to surpass OpenAI’s GPT-4 model in that field. It approaches math problems using a machine learning technique called reinforcement learning, wherein rewards are given for correct or optimal actions, and punishment is given for incorrect or suboptimal actions. Machines can learn the shortest path (shortest route) to an expected reward through exploration of all possible paths, finding a more optimal route through trial and error, and achieving an optimized state over time, making better decisions each time.

But how does this all relate to Q*? Q-values, also known as action values, allow us to put a number value on the effectiveness of a given action at a given time. Storing this value in a Q-table, alongside all other Q-values, a machine can objectively decide the effectiveness of that action, and as a result, the highest number is the most optimal solution found (so far or at a given time) by that algorithm....

....MUCH MORE