When researchers experiment with machine learning, it’s often to create helpful tools for humans. Computers, by design, compliment our natural flaws: they don’t get tired, or stressed out, and they especially don’t exaggerate. Typically, they’re all about the hard facts and numbers. But Norwegian developer Lars Eidnes took the opposite approach, creating a learning machine designed to trick humans by preying their curiosity and gullibility—much as some human writers do these days. Eidnes built a clickbait generator.
Clickbait—a term referring to articles with sensationalist headlines that fail to deliver on their premises—runs rampant on the internet today, as a growing number of media outlets greatly exaggerate or inflate relatively insignificant events to grab readers’ eyes. Large, bold headlines promise lists of items that will shock, delight, inspire, or amaze you (“You won’t believe…”).
Eidnes built a neural network that read somewhere in the ballpark of 2 million of these headlines from the likes of mass online media outlets including Buzzfeed, Gawker, Jezebel, Huffington Post, and Unworthy (all of which have been accused of clickbait).
Neural networks are clusters of mathematical formulas that decode relationships between inputs. So if you make the neural network read 2 million clickbait headlines (this process is called training), it will break down which words relate to others. By understanding the relationships between words, it’s able to predict what could happen next with a reasonable amount of certainty.
After it’s trained, when the network is asked to make a sentence, it finds and outputs a word, then loops back and does the whole thinking process again, incorporating that first word as context. This architecture is called a recurrent neural network (as it recurs), and machine learning researchers have found that it’s great for tasks that need to happen in order or with an element of time.
Eidnes’ neural network didn’t understand much about the world after the first training session. It would generate headlines like: “Real Walk Join Their Back For Plane To French Sarah York” or “Economic Lessons To Actress To Ex – Takes A App.” They don’t make much sense. However, after a few more passes at the information, the network said, “John McCain Warns Supreme Court To Stand Up For Birth Control Reform.”
Eidnes pulls a few other examples out of the data, like “Romney Camp: ‘I Think You Are A Bad President.’”
“It’s suspiciously good – it wouldn’t surprise me if this was a real headline that some website had published, Eidnes writes in a blog post detailing the system. “But it’s not in the dataset, not even close.”
Apparently in the 17 times “Romney Camp” appears, it was never correlated with the presidency. And the one time the phrase “Bad President” was mentioned, it was by Marco Rubio. From this information, Eidnes writes that the network has some form of semantic understanding, and can understand political relationships.
To put the network to further use, he created Clickotron.com, which updates with a new, artificially generated story every 20 minutes. The site automatically searches Wikimedia Commons for a relevant picture, and generates some body text. To separate the wheat from the chaff, Clickotron.com has a voting mechanism much like Reddit.
“This gives us an infinite source of useless journalism, available at no cost,” Eidnes writes. “If I remember correctly from economics class, this should drive the market value of useless journalism down to zero, forcing other producers of useless journalism to produce something else.”