Technology site endorses automated troll filters

In the wake of the Leslie Jones fiasco which resulted in Twitter banning Milo Yiannopoulos, technology news site Mashable posted an OpEd “Twitter, it’s time to take down the trolls with tech” by Lance Ulanoff in which he advocated for automated troll filters for Twitter.

Given Ulanoff’s long history in tech journalism, I’m moderately surprised that he apparently fails to understand the technical implications of what he’s asking for, never mind the political and social ramifications of the idea.

Ulanoff’s article doesn’t seem to have gotten much of airing, so here is a slightly edited version of the critique that I posted in the comments. The subject is particularly relevant when, in the last month, we have seen Milo ousted from Twitter and AVfM’s own Facebook page being removed (apparently deleted, mysteriously to reappear a week later).

Ulanoff asks some important questions.  I’m a computer scientist and am also active in politics, so let me share my perspectives.

Why Twitter probably can’t do what Ulanoff asks

In short, I think automated anti-abuse is a terrible idea, however well-intentioned its motivation.  I trust computers to do repetitive, same-y things automatically, but I don’t trust them to make executive decisions (in the cognitive sense) or take control away from humans (which is why I think driverless cars and pilotless planes are a terrible idea also).

There’s no particular technological reason that I can think of why an automatic filter couldn’t be implemented although, depending on Twitter’s architecture, it could be expensive and given Twitter’s commercial disposition, they may not be in a position to wear such costs.  That may be why Twitter offers its “quality filter” only to its top tier, verified users (of which Leslie is one, so I guess Leslie didn’t have it turned on).

There are a variety of techniques including relatively simple Bayesian statistical methods, Machine Learning (which is not precisely AI, but rather a way of training a machine to recognise certain patterns) and Natural Language Processing that can, to some extent, understand the semantic meaning of tweets (though both might be frustrated by the abbreviations and general mangling necessitated by the 140 character limit).  All are likely to be computationally expensive relative to the volume of tweets because they must consider not only each individual tweet but hold a great deal of state about past tweets.

[Just to put some numbers on this, if Twitter sees a peak volume of 350k tweets/minute (as claimed by Ulanoff), the algorithm has just 170 microseconds (millionths of a second), to process each tweet.   For comparison, it takes about twenty times longer to retrieve a single bit of information from even the fastest hard disc.  Note that the quoted figure is more likely to be an average volume; any automatic system would have to be able to cope with the peak volume that may be in the millions of tweets per minute (or more) during significant world events.  With enough parallelism, it’s still possible — but it gets very complex very quickly.]

Keyword searches aren’t likely to be effective (and are trivially circumvented anyway in a sort of arms race with trolls; c.f. the “Scunthorpe Problem” (named after a town in North Lincolnshire, England) for a trivial example of why).

Ironically, Mashable’s comment system itself suffers from the Scunthorpe Problem: I had to bowdlerise the spelling of the name with spaces in order to get it past the ‘naughty word’ filter, which nicely exemplifies why automatic systems aren’t the silver bullet that Ulanoff seems to think.

The double irony is that this takes place in a discussion about whether false positives matter.

Counting tweets probably wouldn’t do anything useful.  If you count tweets received (regardless of where from), then all you can do is alert the recipient and ask them what they want to do.  I don’t see how that would stop abuse at its source. If you count tweets sent, you’ll (potentially) catch spammers and individuals targeting an individual, but that won’t help dogpiling in general (because the tweets are coming from multiple sources) nor in the Milo/Leslie case specifically.  Dogpiling is analogous to distributed denial of service attacks, a scourge to which the best minds still have not found a comprehensive, robust and automatic solution.

But the biggest problem is the subjectivity of what constitutes abuse: if there can be significant disagreement between any given pair of humans as to whether a given instance is abusive, what chance has a computer to figure it out automatically?  Of course, whether a given pair of humans agree or disagree is likely to follow political persuasion, but even controlling for that there will still be a lot of disagreement.  Just look at reactions within conservative circles to Milo’s ban for an example.

So the reason Twitter doesn’t automate identification of hateful and abusive speech is probably because it can’t, not with much reliability.  The best they can do is rely on reporting mechanisms.

But reporting mechanisms can be abused, too.  Reddit is plagued with brigading attacks (where a bunch of people collude to influence voting one way or the other) and both Twitter and Facebook’s reporting mechanisms are regularly abused in much the same way which is probably why Facebook, in particular, are under fire for ‘censorship’.

Prevention impersonation by theft of names and/or avatars

A lot of the time, unicode characters are used to simulate a few characters from the target’s name that look similar to the target’s name.  To the human eye, they look nearly identical; to a computer, they are quite different.  If Twitter dropped unicode support (its developers may celebrate because unicode is horrible to work with!), say goodbye to languages that use non-Latin alphabets and many diacritics in those that do.  So preventing a bad-faith user from using their target’s name is rather harder than Ulanoff might think.

Image comparison is harder still because there can be a lot of binary differences between two images that otherwise look identical (or very close to).  Comparison algorithms can be frustrated by various techniques including sharpening, blurring or noise  There is even a branch of cryptography called steganography that exploits this fact in which a message is encoded in noise that will probably look just like grain (à la film) or texture.  Example: YouTube use an algorithm to detect pirated content, but it can’t intercept pirated videos in realtime, and the algorithm can be confounded by various distortions which, while they reduce the quality of the pirated video, are still perfectly watchable.

Twitter’s [allegedly] slow handling of abuse reports

Twitter’s reporting mechanism must certainly be automated, so the slow response to reported tweets is probably a reflection of the way their algorithm works: it probably needs to receive sufficiently many complaints within a certain timeframe before it will act.

Twitter’s definition of “hate speech”

Ulanoff’s million dollar question is this:

What’s the real benchmark for abuse that runs afoul of your rules. How does Twitter define hate speech?

Lots of other people would like an answer to this, too, because whatever it is, Twitter doesn’t seem to apply it very even-handedly.  Cynically, there is some anecdotal evidence to suggest that they are, in fact, quite partisan in the way they apply their ‘hate speech’ standards. Case in point combining both the questions of consistency and technology: on the face of it, it would appear that Leslie has tweeted some moderately racist stuff herself.

It also appears that there are some fake tweets running around purportedly by Leslie, and there is absolutely nothing Twitter can do about that because anybody can screen-shot a doctored web page or else modify a screenshot of a real tweet with Photoshop.

Both are very Not Cool.  The fake tweets aren’t Leslie’s fault, but the fact that she (apparently) sent similarly hateful tweets in the past makes it very difficult to discern between fake and real tweets, which means she doubtless is getting a lot more hate based on things she didn’t say from those taken in by the fake tweets — including Milo.  If he tweeted fake tweets, did he know they were fake?  If not, why does he deserve to be punished for tweeting them?

How long can they hide behind free speech?

This is the rub.  Either offensive speech is tolerated by Twitter, or it is not.  (And note that freedom of speech is meaningless where the speech is inoffensive and uncontroversial.  The real test is what happens when you have to deal with points of view with which one disagrees).  As has been pointed out elsewhere, Twitter is a privately-owned platform (and no, being a PLC does not make it any less privately owned) and can, therefore, set whatever policies it likes.

However, it must a) be open, transparent and explicit about what their policies are, and b) enforce those policies consistently and without favouring any demographic or philosophy over another unless required to by their openly-stated policy.

I think Jack Dorsey (CEO) is a bit confused about this.  On February 9th, he tweeted in reply to Twitter’s rather Orwellian-sounding “Truth and Safety Council”:

Twitter stands for freedom of expression, speaking truth to power, and empowering dialogue. That starts with safety.

Platitudes like ‘speaking truth to power’ and ’empowering dialogue’ are all very well, but dialogue means a two-way conversation, which means the “power” (if that it be) has to have the opportunity to answer back whether or not Ulanoff (or Twitter) likes what it has to say.

And there is a fundamental contradiction between “safety” (however that is defined) and freedom of expression: some ideas and opinions aren’t “safe”.  Some individuals will be upset by a tweet that doesn’t bother others (even controlling for political affiliation).  Case in point from the OP:

Yiannopoulos’ actions revealed Twitter’s darkest, most disturbing and racist impulses.

That’s a perfectly legitimate point of view, one with which many will agree, but it should be pointed out that calling somebody a racist is a pretty serious accusation (assuming Ulanoff meant it with the seriousness that the term used to/ought to have).  What, objectively speaking, is the difference between calling Milo unpleasant names and calling Leslie unpleasant names?  (“Because I’m right”?  Careful, now…)

Others interpret Milo’s attitudes (towards Muslims, for example) in a different light and can see his point even if they don’t agree with him because they understand where he’s coming from.  And yet others still may disagree with them, whether or not they see where they’re coming from.

Humans are messy and complicated creatures, and nothing involving them is so simple that it can be reduced down to an algorithm.

The choice that Twitter must make is whether it is truly committed to “freedom of expression” (as opposed to “expression that Twitter agrees with”) and take the bad with the good, or whether it wishes to become a “safe space” in which there is no place for any sort of freedom of expression.  There is very little, if any, middle ground.

Will this result in false positives? Absolutely. Is it worth it to clean Twitter the hell up while still allowing for free speech?

Is it?  It depends on what you value and on your political disposition, I guess.

Stereotypes, which frequently lead to discrimination, are a kind of heuristic or algorithm which humans use to reach conclusions about a specific individual drawn from generalisations, which is pretty much exactly what the heuristics and algorithms Ulanoff proposes would do and, like those algorithms, stereotypes throw up many false positives.

Ask the victims of stereotyped discrimination how they feel about being a false positive.

Recommended Content

%d bloggers like this: