News You Can Use: "Google’s AI to detect toxic comments can be easily fooled with ‘love’"
From The Next Web:
A group of researchers has found that simple changes in sentences and its structure can fool Google’s perspective AI,
made for detecting toxic comments and hate speech. These methods
involve inserting typos, spaces between words or add innocuous words to
the original sentence.
The AI project, which was started in 2016 by a Google offshoot called Jigsaw, assigns a toxicity score to a piece of text. Google
defines a toxic comment as a rude, disrespectful, or unreasonable
comment that is likely to make you leave a discussion. The researchers
suggest that even a slight change in the sentence can change the
toxicity score dramatically. They saw that changing “You are great” to
“You are fucking great”, made the score jump from a totally safe 0.03 to
a fairly toxic 0.82.
This clearly denotes that the toxicity score is probably not the best
measure to identify hate speech. Last year, another study found that
inserting spaces and making typos reduced the toxicity score drastically.
Google has improved its AI since then to detect these changes. But it’s
not perfect, the researchers presenting the latest study said if
someone introduced a word like ‘love’ in these sentences the score took a
plunge....MORE