Wednesday, February 14, 2018

"Talk down to Siri like she's a mere servant – your safety demands it"

The "mere" is troubling for some reason but it's CPI day so no time to reflect on why.

From The Register:

Voice assistants get samples of our voice that can be remixed and faked
In the middle of the night, the 83-year-old woman received a call. A caller identifying himself as a policeman angrily reported that her grandson – identified by name – had landed in jail. He'd hit a policeman while driving and TXTing.

The policeman said they needed $4,000 in bail – immediately.

The old woman hung up, but the phone rang again, and the policeman said she could speak to her grandson: he came on the line, pleading with his grandmother for bail money. She said she wouldn't do it – not because she was hard-hearted, but because something didn't "feel right".

At that point, a man identifying himself as her grandson's defence attorney came on the line, exclaiming, "I don't need this case – I have 10 others!" But the grandmother remained adamant – no bail money – and so the call ended.

It didn't take her long to contact her grandson and learn the whole thing had been a setup, a scam from masters of the craft. Yet the level of detail possessed by scammers – that felt weirdly new. They'd tracked down this woman, obtained her phone number, then somehow worked out both the name of her grandson, and that he lived in another town, some miles away.

Could they get all of that personal information from Facebook? Probably not. But it wouldn't be terribly hard to find enough personal details on the social sharing site that it became a relatively straightforward process to trawl through other public databases, assembling a more-or-less complete family picture. Names, addresses, phone numbers: Everything you need to defraud an old lady in the middle of the night.

That it wasn't quite good enough to pass a sniff test says less about the current state of the art than the capacity of the scammers. The last few months have seen a wealth of reports about "deepfake" videos – mapping famous celebrity faces into pornographic films. The technology behind these deepfakes – computer vision and machine learning algorithms – has been publicly available for long enough, and mastery of them has grown widespread enough, that the kind of forgeries that would have required painstaking, highly expert labour can now be handed to a piped-together set of command-line tools.

Adobe VoCo - deepfake for speech
What deepfake is to video, Adobe VoCo – its "Photoshop for audio" – does for speech. Fed a sufficiently long sample of any speaker – such as Barack Obama, who provides plenty of source material – and arbitrary speech can be endlessly generated. Obama can be made to say anything at all.

Imagine if those scammers had gotten a voice sample of that grandson: When his grandmother spoke to his vocal simulacrum, it would have responded in the right tones to make her believe – and pay....MORE