Sat. Sep 24th, 2022
How do you prevent the behavior of an AI from becoming predictable?

Many neural networks are black boxes. We know they can categorize things successfully — cat images, cancer X-rays, and so on — but for many of them, we don’t understand what they’re using to come to that conclusion. But that doesn’t mean people can’t deduce the rules they use to fit things into different categories. And that poses a problem for companies like Facebook, which hopes to use AI to delete accounts abusing its terms of service.

Most spammers and scammers create accounts in bulk and they can easily spot differences between those who get banned and those who slip under the radar. Those differences can allow them to bypass automated algorithms by structuring new accounts to avoid the features that lead to a ban. The end result is an arms race between algorithms and spammers and scammers trying to guess their rules.

Facebook believes it has found a way to avoid getting involved in this arms race while still using automated tools to monitor its users, and this week decided to tell the press about it. The result was an interesting insight into how AI-based moderation can remain useful in the face of hostile behavior, an approach that can be applied far beyond Facebook.

The problem

Facebook sees billions of active users in a month, and only a small fraction of those fall into the category the company calls offensive: fake and compromised accounts, spammers, and those who use the social network to commit scams. So while the company can (and will) use human moderators, the problem is simply big enough that they can’t be expected to catch everything. Which means some sort of automated system is needed if the service doesn’t want to be inundated with content it doesn’t want to see.

Facebook (or any other social network operator) obviously has access to a lot of data that could be used by an automated system: an account’s message history, details provided when signing up, friend networks, and so on. And an algorithm could easily use that data to identify problematic accounts, including neural networks trained using the data and a human-curated list of problematic and acceptable behaviors.

The problem, as mentioned above, is that the people with abusive accounts can also access all this data and may be able to figure out the features that cause accounts to be banned. Alternatively, they can change their behavior enough to avoid suspicion. This increases the risk of an arms race, where the scammers are constantly one step ahead of the algorithms intended to catch them.

To avoid this, Facebook researchers have moved from using account data to what might be called account metadata. Instead of using the number of posts a particular account could post, it looks at the number of posts a typical friend’s account posts. Similar values ​​can be generated for the average number of friends the account’s friends are connected to, how often friend requests are sent, and so on. A set of values ​​like this combine into a profile that the company’s researchers call a “deep entity.”

The assumption here is that the typical account will establish relationships with accounts that are also closer to the typical. Meanwhile, a spammer is likely to have fewer connections to real accounts and more to things like bot accounts, which also exhibit unusual behavior patterns and connections. The deep entity profile captures these differences in aggregate and provides two main benefits: it is much more difficult for abusive account owners to understand which aspects of a deep entity are being used by an algorithm, and it is much more difficult for account owners to understand. change, even if they could understand it.

Deep entity classification

Deep entity classification is relatively simple, albeit a bit computationally intensive. It simply involves crawling a particular user’s network graph and collecting data from all of its connections. Where things enter the realm of computer science is how these classifications are used to actually identify problematic accounts.

Facebook engineers decided to use a neural network to perform the classification. That requires the network to have training data: deep-entity profiles tagged with indications of whether the account is problematic or not. Here the engineers had two options. Work with other classification algorithms had yielded a large amount of relatively uncertain data that various accounts flagged as problematic or not. Meanwhile, human moderators had gone through a much smaller set of accounts, but had made much higher quality conversations as to whether the account was abusive.

The folks at Facebook, of course, decided to use both. They produced a two-tier system. In the outer layer, a multi-layered neural network used the low-quality training data to identify accounts with deep-entity profiles typically associated with strange behavior. While this neural network would naturally process the data until it arrived at a binary decision — offensive or not — the researchers basically stopped the analysis on the layer just below the binary decisions.

At this point, the network had processed the original deep entity information into a limited set of values ​​that it would use to determine whether an account’s connections are unusual or not. These values ​​can be extracted as a 32-digit vector that captures the characteristics typically associated with unusual accounts.

These values ​​were then passed to a second form of processing, using a machine learning approach called a decision tree. This decision tree is trained using human tagged account data. Crucially, the Facebook engineers trained multiple decision trees: one for spammers, one for hijacked accounts, and so on. These decision trees determine whether an account is a problem and should be deactivated.

Informatics meets policy

The system has been in production for a while now and has proved quite successful, blocking at least half a billion accounts per quarter, peaking at over 2 billion blocks in the first quarter of last year. Blocked accounts can also be used to continuously retrain the system in the background, and it can evaluate its own statistics to determine when the retraining has progressed to the point where the in-production system can be productively replaced.

While the system can be effective, the decision on how to deploy the system (and how to integrate it with a larger strategy for acceptable content) is a matter of policy rather than computer science. Human moderators offer a higher level of accuracy in their conversations as to whether content is offensive, and a Facebook communications manager told Ars that the company is greatly expanding the use of human moderators. But people can only act on reported content, while the algorithms can act preemptively. So finding the right balance between investing in the two aspects of moderation will ultimately become a review.

The other issue suggested by this technology is whether it can be leveraged against the accounts spreading misinformation on topics like climate change and health information– the latter problem threatens to get worse as the coronavirus continues to spread unabated. Here, the company has taken a tricky line, trying to avoid becoming, in the words of its communications manager, “the arbiter of truth,” most notably by refusing to control the factual content of political advertisements. The outsourcing approach to fact-checking has sparked fire because sites with questionable histories regarding facts can serve as fact-checkers.

Facebook’s communications manager told Ars that specific health claims debunked by the WHO or CDC can be removed. But there’s no indication that groups that repeatedly make such claims will ever see their accounts suspended, although tools like the one described here should make identifying them much easier. Put another way, while Facebook’s engineers may have done a masterful job developing a system that can identify problematic accounts, how to apply that technology remains a policy decision.

By akfire1

Leave a Reply

Your email address will not be published.