Sunday, August 13, 2023

"ChatGPT Is Cutting Non-English Languages Out of the AI Revolution"

Out of the LLM revolution at any rate. Cat face is universal.

From Wired, May 31:

AI chatbots are less fluent in languages other than English, threatening to amplify existing bias in global commerce and innovation.

Computer scientist Pascale Fung can imagine a rosy future in which polyglot AI helpers like ChatGPT bridge language barriers. In that world, Indonesian store owners fluent only in local dialects might reach new shoppers by listing their products online in English. “It can open opportunities,” Fung says—then pauses. She’s spotted the bias in her vision of a more interconnected future: The AI-aided shopping would be one-sided, because few Americans would bother to use AI translation to help research products advertised in Indonesian. “Americans are not incentivized to learn another language,” she says.

Not every American fits that description—about one in five speak another language at home—but the dominance of English in global commerce is real. Fung, director of the Center for AI Research at the Hong Kong University of Science and Technology, who herself speaks seven languages, sees this bias in her own field. “If you don’t publish papers in English, you’re not relevant,” she says. “Non-English speakers tend to be punished professionally.”

Fung would like to see AI change that, not further reinforce the primacy of English. She’s part of a global community of AI researchers testing the language skills of ChatGPT and its rival chatbots and sounding the alarm about evidence that they are significantly less capable in languages other than English.

Although researchers have identified some potential fixes, the mostly English-spewing chatbots spread. “One of my biggest concerns is we’re going to exacerbate the bias for English and English speakers,” says Thien Huu Nguyen, a University of Oregon computer scientist who’s also been on the case against skewed chatbots. “People are going to follow the norm and not think about their own identities or culture. It kills diversity. It kills innovation.”

At least 15 research papers posted this year on the preprint server arXiv.org, including studies co-authored by Nguyen and Fung, have probed the multilingualism of large language models, the breed of AI software powering experiences such as ChatGPT. The methodologies vary, but their findings fall in line: The AI systems are good at translating other languages into English, but they struggle with rewriting English into other languages—especially those, like Korean, with non-Latin scripts.

Despite much recent talk of AI becoming superhuman, ChatGPT-like systems also struggle to fluently mix languages in the same utterance—say English and Tamil—as billions of people in the world casually do each day. Nguyen’s study reports that tests on ChatGPT in March showed it performed substantially worse at answering factual questions or summarizing complex text in non-English languages and was more likely to fabricate information. “This is an English sentence, so there is no way to translate it to Vietnamese,” the bot responded inaccurately to one query.

Despite the technology’s limitations, workers around the world are turning to chatbots for help crafting business ideas, drafting corporate emails, and perfecting software code. If the tools continue to work the best in English, they could increase the pressure to learn the language on people hoping to earn a spot in the global economy. That could further a spiral of imposition and influence of English that began with the British Empire.

Not only AI scholars are worried. At a US congressional hearing this month, Senator Alex Padilla of California asked Sam Altman, CEO of ChatGPT’s creator, OpenAI, which is based in the state, what his company is doing to close the language gap. About 44 percent of Californians speak a language other than English. Altman said he hoped to partner with governments and other organizations to acquire data sets that would bolster ChatGPT’s’s language skills and broaden its benefits to “as wide of a group as possible.”

Padilla, who also speaks Spanish, is skeptical about the systems delivering equitable linguistic outcomes without big shifts in strategies by their developers. “These new technologies hold great promise for access to information, education, and enhanced communication, and we must ensure that language doesn’t become a barrier to these benefits,” he says.

OpenAI hasn’t hid the fact its systems are biased. The company’s report card on GPT-4, its most advanced language model, which is available to paying users of ChatGPT, states that the majority of the underlying data came from English and that the company’s efforts to fine-tune and study the performance of the model primarily focused on English “with a US-centric point of view.” Or as a staff member wrote last December on the company’s support forum, after a user asked if OpenAI would add Spanish support to ChatGPT, “Any good Spanish results are a bonus.” OpenAI declined to comment for this story....

....MUCH MORE

Okay, cat face isn't universal. You have to query the LLM.  

Cats on the other hand...