Sunday, May 13, 2018

AI VC: "We Are Here To Create"

Sometimes the competition is just plain intimidating/scary/resistance-is-futile, smart.
"KAI-FU LEE, the founder of the Beijing-based Sinovation Ventures, is ranked #1 in technology in China by Forbes. Educated as a computer scientist at Columbia and Carnegie Mellon, his distinguished career includes working as a research scientist at Apple; Vice President of the Web Products Division at Silicon Graphics; Corporate Vice President at Microsoft and founder of Microsoft Research Asia in Beijing, one of the world’s top research labs; and then Google Corporate President and President of Google Greater China. As an Internet celebrity, he has fifty million+ followers on the Chinese micro-blogging website Weibo. As an author, among his seven bestsellers in the Chinese language, two have sold more than one million copies each. His first book in English is AI Superpowers: China, Silicon Valley, and the New World Order (forthcoming, September). Kai-Fu Lee's Edge Bio page
My original dream of finding who we are and why we exist ended up in a failure. 
Even though we invented all these wonderful tools that will be great for our future, for our kids, for our society, we have not figured out why humans exist. What is interesting for me is that in understanding that these AI tools are doing repetitive tasks, it certainly comes back to tell us that doing repetitive tasks can’t be what makes us humans. The arrival of AI will at least remove what cannot be our reason for existence on this earth. If that’s half of our job tasks, then that’s half of our time back to thinking about why we exist. One very valid reason for existing is that we are here to create. What AI cannot do is perhaps a potential reason for why we exist. One such direction is that we create. We invent things. We celebrate creation. We’re very creative about scientific process, about curing diseases, about writing books, writing movies, creative about telling stories, doing a brilliant job in marketing. This is our creativity that we should celebrate, and that’s perhaps what makes us human.


The question I always ask myself, just like any human being, is who am I and why do I exist? Who are we as humans and why do we exist? When I was in college, I had a much more naïve view. I was very much into computers and artificial intelligence, and I thought it must be the case that I’m destined to work on some computer algorithms and, along with my colleagues, figure out how the brain works and how the computer can be as smart as the brain, perhaps even become a substitute of the brain, and that’s what artificial intelligence is about.

That was the simplistic view that I had. I pursued that in my college, in my graduate years. I went to Carnegie Mellon and got a PhD in speech recognition, then went to Apple, then SGI, then Microsoft, and then to Google. In each of the companies, I continued to work on artificial intelligence, thinking that that was the pursuit of how intelligence worked, and that our elucidation of artificial intelligence would then come back and tell us, "Ah, that’s how the brain works." We replicated it, so that’s what intelligence is about. That must be the most important thing in our lives: our IQ, our ability to think, analyze, predict, understand—all that stuff should be explicable by replicating it in the computer.

I’ve had the good fortune to have met Marvin Minsky, Allen Newell, Herb Simon, and my mentor, Raj Reddy. All of these people had a profound influence on the way I thought. It’s consistent that they too were pursuing the understanding of intelligence. The belief at one point was that we would take the human intelligence and implement it as rules that would have a way to act as people if we provided the steps in which we go through our thoughts.

 For example, if I’m hungry, then I want to go out and eat. If I have used a lot of money this month, I will go to a cheaper place. A cheaper place implies McDonald’s. At McDonald’s I avoid fried foods, so I just get a hamburger. That "if, then, else" is the way we think we reason, and that’s how the first generation of so-called expert systems, or symbolic AI, proceeded. I found that it was very limiting because when we wrote down the rules, there were just too many.

There was a professor at MCC (Microelectronics and Computer Consortium), named Doug Lenat, who is one of the smartest people I know. He hired hundreds of people to write down all the rules that we could think of, thinking that one day we’d be done and that would be the brain. Apple and Microsoft funded his research. I remember visiting him, and he was showing me all these varieties of flowers and sharing his understanding of what type of a flower this was, and which flowers had how many petals and what colors. It just turns out that the knowledge in the world was just too much to possibly enter, and their interactions were too complex. The rule-based systems, that engine, we didn’t know how to build it. 
That was the first wave. People got excited, thinking we could write rules, but that completely failed, resulting in only maybe a handful of somewhat useful applications. That led everybody to believe AI was doomed and not worth pursuing.

I was fortunate to have been with the second wave, and that coincided with my PhD work at Carnegie Mellon. In that work, I wondered if we could use some kind of statistics or machine learning. What if we collected samples of things and trained the system? These could be samples of speech to train the different sounds of English, samples of dogs and cats to train recognition of animals, etc. Those resulted in pretty good results at the time. The technology I developed and used in my PhD thesis was called "Hidden Markov Models." It was the first example of a speaker independent speech recognition system, which was, and still is, used in many products. For example, hints of my work carried over by people who licensed the work or who worked on the team are evident in Siri, in the Microsoft speech recognizer, and other technologies used in computer vision and computer speech. I did that work at Carnegie Mellon in the ‘80s, got my thesis in ’88, and I continued to work at Apple from ’90 to ’96, then at Microsoft Research, around the year 2000.

We were optimistic that extrapolation of this work should work because we saw results improving. But after a decade of work, we saw the significant improvements were reaching an asymptote. It wasn’t going up any higher, so we were frustrated. Again, a number of people said, "You can recognize 1000 words, you can recognize 100 objects, but this is not extensible. Humans can understand infinite vocabulary, even new words that are made up. This is not smart. This is not AI." Then came the second crash of artificial intelligence because it didn’t demonstrate that machines were able to do what humans can do.

In the first wave, I had the good luck of getting to know the psychologist and computer scientist Roger Schank. In fact, one of his students was an advisor of mine in my undergrad years. Those were the experiments that led me to believe that expert systems could not scale, and that our brains probably didn’t work the way we thought they did. I realized that in order to simplify our articulation of our decision process, we used "if, then, else" as a language that people understood, but our brains were much more complex than that.

During the second wave, in my thesis and PhD, I read about Judea Pearl’s work on Bayesian networks. I was very much influenced by a number of top scientists at IBM including Dr. Fred Jelinek, Peter Brown, and Bob Mercer. They made the mark in making statistical approaches become the mainstream, not only for speech but also for machine translation. I owe them a lot of gratitude. We still got stuck, but not because the technologies were wrong; in fact, the statistical approaches were exactly right.

When I worked on Hidden Markov Models at Carnegie Mellon in the late '80s, Geoff Hinton was right across the corridor working on neural networks, which he called "Time Delayed Neural Networks." Arguably, that was the first version of convolutional neural networks, which is now the talk of the town as deep learning becomes a dominant technology.

But why did that wave of statistical and neural net-based machine learning not take off? In retrospect, it had nothing to do with technology—most of the technology was already invented. The problem was we just that we didn't have enough training data. Our brains work completely differently from the way these deep-learning machines work. In order for deep-learning machines to work, you have to give it many orders of magnitude more training data than humans are used to. Humans can see maybe hundreds of faces and start to recognize people, but these deep-learning neural networks would love to see billions of faces in order to become proficient.

Of course, once they're proficient, they’re better than people. That is the caveat. But at that time, we simply didn’t have enough training data, nor did we have enough computing power to push these almost discovered technologies to the extreme. Google was the company that began to realize that in order to do search you need a lot of machines, and you need them to be parallel. And then Jeff Dean and others at Google found that once you had those parallel machines you could do more than search, you could build AI on top of that. Then they found that to do AI, you needed specialized chips to do those well. Then came NVidia’s GPUs. And then Google did its own TPUs. It's been an interesting progression. It was a fortuitous incident that Google chose to do search and search needed servers, and they had Jeff Dean, that evolved to today’s architecture of massively parallel GPU- or TPU-based learning that can learn from a lot more data from a single domain.

New technologies developed based on this massively parallel machine-learning architecture built on GPUs and new accelerators. More and more people were able to train face recognizers, speech recognizers, image recognizers, and also apply AI to search and prediction. Lots of Internet data came about. Amazon uses it to help predict what you might want to buy, Google uses it to predict what ad you might want to click on and potentially spend money, and Microsoft uses it. In China we have Tencent and Alibaba. Many applications are coming about based on the huge amounts of Internet data.

At the same time technologies were progressing, Geoff Hinton, Yann LeCun, and Yoshua Bengio were the three people who continued to work on neural networks, even though in the early 2000s they were no longer in the mainstream. In the ‘80s, this work was a novelty, and breakthrough statistical work indicated that these networks didn’t scale. Funding agencies then abandoned them, conferences stopped accepting their papers, but these three researchers kept at it with small amounts of funding to refine and develop better algorithms, and then more data came along. A breakthrough came with the creation of new algorithms, sometimes called "convolution neural networks," and now known as "deep learning." Some variant of the work is also related to reinforcement training, transfer learning.

This set of technologies that emanated from these three professors began to blossom in the industry. Speech recognition systems built by top companies are beating human performance, and it's the same with face recognition companies and image recognition. There are ecommerce implications, speaker/user identification; it was applied to Internet data, higher prediction for Amazon, making more money in the process; better predictions for Facebook in terms of how to rank your news feed; better search results from Google. Deep neural networks started to get used in Google in the late 2000s, and in the last seven or eight years it blossomed to reach almost everywhere. Architectures were coming out and more intelligent systems were being developed....MUCH MORE