Here's our boilerplate introduction to the proprietor of the website we are visiting today:
Andrew Gelman is Professor of statistics and political science at Columbia Uni., the guy who tells the other social scientists how to get their numbers right so they can at least give the appearance of being a science. He has a very tart tongue which, combined with a high level intellect is fun to watch taking on sacred cows and shibboleths. As long as you aren't the target of said intellect and/or sharp tongue....
From Professor Gelman's Statistical Modeling, Causal Inference, and Social Science blog, August 16:
....Franco’s second item is about the performance of chatbots on standardized tests. He writes:
The latest iteration of GPT4 was presented together with the results of it going through a series of standardized exams (see here and here for the paper), something that got a lot of attention due to the impressive results it achieved. When I first saw this, I was equally impressed (and still am), but I think there’s a question to be had here that I haven’t seen so far (caveat, I’m not a researcher in the area, so I could easily have missed it) about a couple of things related to this.
There’s a smaller issue that I’ll go through first, which is with the way they test for contamination in their sample, that being when they check if the questions they use to test Chat’s capacity may have been a part of their training data. They basically test to see if 3 substrings of 50 characters of each question they use in each test was part of the training data and consider it a contaminated question if any of the substrings is present; it’s unclear if they manually check once the method detects a positive.....
Professor Gelman's thinking:
....My reply: I’d separate this into two questions. First, what can the chatbot do; second, what are we asking from humans. For the first question: Yes, the chatbot seems to be able to construct strings of words that correspond closely to correct answers on the test. For the second question: This sort of pattern-matching is often what students learn how to do! We can look at this in a couple ways:
(a) For most students, the best way to learn how to give correct answers on this sort of test is to understand the material—in practice, actually learning the underlying topic is a more effective strategy than trying to pattern-match the answers without understanding.....
....MUCH MORE
Until you know the material the AI was trained on it is very difficult to spot the exact reason for any given response. We've been looking at this issue for a while now. Here's a 2017 post with part of the problem that quants face in our introduction:
"Cracking Open the Black Box of Deep Learning"
One of the spookiest features of black box artificial intelligence is that, when it is working correctly, the AI is making connections and casting probabilities that are difficult-to-impossible for human beings to intuit.
Try explaining that to your outside investors.
You start to sound, to their ears anyway, like a loony who is saying "Etaoin shrdlu, give me your money, gizzlefab, blythfornik, trust me."
See also the famous Gary Larson cartoons on how various animals hear and comprehend:...
And a couple months later another post, this one linking to an MIT article:
We Might Be Getting Closer To Understanding How True 'Black Box' AI Makes Decisions
And many more including an example of an occurrence found throughout the history of science where Bloomberg's Matt Levine posted on the same esoteric topic within a couple days of one of our posts. From September 28, 2017:
Let
Me Be Clear: I Have No Inside Information On Who Will Win The
Man-Booker Prize Next Month (hedge funds, AI and simultaneous discovery)
Over the years we've mentioned one of the oddest phenomena in science,
the simultaneous discovery or invention of something or other, the
discovery/invention of the calculus by Newton and Leibniz is one famous
example (although both may actually have themselves been preceded) but
there are dozens if not hundreds of cases. Here's a related phenomena....
Today Bloomberg View's Matt Levine commends to our attention
a story about one of the world's biggest hedge funds and
prize-putter-upper of what's probably the most prestigious honor in
literature, short of the Nobel, the Man Booker Award.
On Tuesday September 26, 2017, 11:00 PM CDT Bloomberg posted:
The Massive Hedge Fund Betting on AI
The second paragraph of the story:
...Man Group, which has about $96 billion under management, typically takes its most promising ideas from testing to trading real money within weeks. In the fast-moving world of modern finance, an edge today can be gone tomorrow. The catch here was that, even as the new software produced encouraging returns in simulations, the engineers couldn’t explain why the AI was executing the trades it was making. The creation was such a black box that even its creators didn’t fully understand how it worked. That gave Ellis pause. He’s not an engineer and wasn’t intimately involved in the technology’s creation, but he instinctively knew that one explanation—“I can’t tell you why …”—would never fly with big clients looking for answers when Man inevitably lost some of their money...Now that is just, to reuse the phrase, spooky. Do read both the Bloomberg Markets and the Bloomberg View pieces but I'll note right now it's only with Levine you get:
"I imagine a leather-clad dominatrix standing over the computer,
ready to administer punishment as necessary."