“OK Facebook”—Why stop at assistants? Facebook has grander ambitions for modern AI
Facebook’s machine learning pipeline—from research to production—is aimed at an AI future.
Facebook will one day have a conversational agent with human-like intelligence. Siri, Google Now, and Cortana all currently attempt to do this, but go off script and they fail. That's just one reason why Mark Zuckerberg famously built his own AI for home use in 2016; the existing landscape didn't quite meet his needs.HT: AEIdeas' Links and Quotes post, Jan. 13.
Of course, his company has started to build its AI platform, too—it's called Project M. M will not have human-like intelligence, but it will have intelligence in narrow domains and will learn by observing humans. And M is just one of many research projects and production AI systems being engineered to make AI the next big Facebook platform.
On the road to this human-like intelligence, Facebook will use machine learning (ML), a branch of artificial intelligence (AI), to understand all the content users feed into the company’s infrastructure. Facebook wants to use AI to teach its platform to understand the meaning of posts, stories, comments, images, and videos. Then with ML, Facebook stores that information as metadata to improve ad targeting and increase the relevance of user newsfeed content. The metadata also acts as raw material for creating an advanced conversational agent.
These efforts are not some far-off goal: AI is the next platform for Facebook right now. The company is quietly approaching this initiative with the same urgency as its previous Web-to-mobile pivot. (For perspective, mobile currently accounts for 84 percent of Facebook's revenue.) While you can't currently shout "OK Facebook" or "Hey Facebook" to interact with your favorite social media platform, today plenty of AI powers the way Facebook engages us—whether through images, video, the newsfeed, or its budding chatbots. And if the company's engineering collective has its way, that automation will only increase.
Building an intelligent assistant, in theory
In its early stage, Project M exists as a text-based digital assistant that learns by combining AI with human trainers to resolve user intent (what the user wants, such as calling an Uber) that surfaces during a conversational interaction between a user and a Facebook Messenger bot trained using ML. When the human trainer intervenes to resolve intent, the bot listens and learns, improving its accuracy when predicting the user’s intent the next time.
When met with a question, if the bot calculates a low probability that its response will not be accurate, it requests the trainer's help. The bot responds to the user unnoticed by the trainer if it estimates its accuracy as high.
This interaction is possible because of the Memory Networks created by FAIR, the Facebook Artificial Intelligence Research (FAIR) group founded in December 2014. A Memory Network is a neural net with an associated memory on the side. Though not inspired by the human brain, the neural net is like the cortex, and the associated network memory is like the hippocampus. It consolidates information for transfer from long-term, short-term, and spatial navigation memory. When moved to the cortex or neural network, the information is transformed into thought and action.
Facebook open-sourced the Memory Networks intellectual property by publishing its advanced AI research throughout the research community. Artificial Intelligence Research Director Yann LeCun describes Facebook’s intelligent conversational agent of the future as a very advanced version of the Project M that exists today.
“It's basically M, but completely automated and personalized," he said. "So M is your friend, and it's not everybody's M, it's your M, you interacted with it, it's personalized, it knows you, you know it, and the dialogues you can have with it are informative, useful… The personalized assistant that you take everywhere basically helps you with everything. That requires human-level of intelligence, essentially.”
LeCun is a pioneer in AI and ML research. He was recruited to Facebook to build and lead FAIR, essentially leading the first stage in that supply chain between blue sky research and the artificially intelligent systems that everyone on Facebook uses today.
As the advanced research indicates, the current Project M bots are not LeCun’s end. They are a milestone, one of many in reaching the long-term goal of an intelligent conversational agent. LeCun cannot predict when the end-goal will be reached, and it may not even happen during his professional career. But each interim milestone defines the hardware and software that needs to be built so that a future machine can reason more like a human. Functionality becomes better defined with each iteration.
The obstacles to teaching computers to reason like humans are significant. And with his 30 years of research experience in the field, LeCun believes Facebook can focus on 10 scientific questions to better emulate human-like intelligence. He shared a few of these during our visit.
For instance, at ages three to five months, babies learn the notion of object permanence, a fancy way of explaining that the baby knows that an object behind another is still there and an unsupported object will fall. AI researchers have not built an ML model that understands object permanence.
As another example, today sentences like "the trophy didn't fit in the suitcase because it was too small" pose too much ambiguity for AI systems to understand with high probability. Humans easily disambiguate that the pronoun “it” refers to the suitcase, but computers struggle to resolve the meaning. This is a class of problem called a Winograd Schema. Last summer, in the first annual Winograd Schema Challenge, the best-trained computer scored 58 percent when interpreting 60 sentences. To contextualize that score, humans scored 90 percent and completely random guessing scored 44 percent—computers are currently closer to a guess than they are to humans when it comes to these problems.
“It turns out this ability to predict what's going to happen next is one essential piece of an AI system that we don't know how to build," LeCun says, explaining the general problem of a machine predicting that “it” refers to the suitcase. "How do you train a machine to predict something that is essentially unpredictable? That poses a very concrete mathematical problem, which is, how do you do ML when the thing to predict is not a single thing, but an ensemble of possibilities?”
Hardware as the catalyst
If these problems can be solved and the 10 scientific questions can be answered, then ML models can be built that can reason like a human. But new hardware will be needed to run them—very, very large neural networks, using a yet-to-be conceived distributed computational architecture connected by very high-speed networks running highly optimized algorithms will be necessary to run these models. On top of that, new specialized supercomputers that are very good at numerical computation will be needed to train these models.
The ML developments of the last decade give credence to the idea of new, specialized hardware as a catalyst. Though ML research was proven, few researchers previously pursued ML. It was believed to be a dead-end because generic hardware powerful enough to support research was not available. In 2011, the 16,000 CPUs housed in Google’s giant data center used by Google Brain to recognize cats and people by watching YouTube movies proved ML worked, but the setup also proved that few research teams outside of Google had the hardware resources to pursue the field.
The breakthrough came in 2011 when Nvidia researcher Bryan Catanzaro teamed with Andrew Ng’s team at Stanford. Together, these researchers proved that 12 Nvidia GPUs could deliver the deep-learning performance of 2,000 CPUs. Commodity GPU hardware accelerated research at NYU, the University of Toronto, the University of Montreal, and the Swiss AI Lab, proving ML’s usefulness and renewing broad interest in the field of research....MUCH MORE