Secure Systems: 2019

Thursday, 20 June 2019

Historical insight into the development of Mobile TEEs

Today, Trusted Execution Environments (TEEs) are ubiquitous in mobile devices from all major smartphone vendors. They first appeared in Nokia smartphones fifteen years ago. Around the turn of the century, Nokia engineers in Finland played a crucial role in developing mobile TEE technology and paving the way for its widespread deployment. But this important story is not widely known as there is limited public documentation. To address this gap of “missing history”, we carried out an oral history project during the spring of 2019. In this post, we summarize our findings.

Historical insight into TEEs

Trust in mobile devices is a prerequisite for the modern mobile industry ranging from e-commerce to social media. Customers, companies, and regulators need to trust the mobile phone with sensitive information such as credit card numbers and fingerprints. A trusted execution environment (TEE) — a secure area that runs a computing system in parallel with, but isolated from, the main operating system — is central to modern mobile security. Through creating technical conditions for different stakeholders to rely on the mobile system, it has been an essential component in the development of the modern mobile business in general.

During the spring of 2019, the Secure Systems Group at Aalto University hosted an oral history project on the development of mobile TEEs. The project focused on the role played by Nokia experts in the emergence and establishment of mobile TEEs. We conducted a series of interviews with fifteen key actors: senior directors and managers, researchers, and security professionals. The aim was to increase the understanding of the mobile platform security systems today and to recognize the human actors behind their development and widespread deployment.

The starting point in any historical inquiry is that technological development is never self-evident nor pre-determined, neither does it takes place in isolation. Instead, technological systems are developed by individuals at a certain place and time, restricted by different economic, technical, regulatory, and other factors. Understanding the development of the mobile secure execution system helps us to master it today and to improve it tomorrow.

Emergence of mobile security

Communication is imperative for modern societies. The history of telecommunication goes back centuries from visual signalling, e.g., the semaphore system to the electronic telegraph and radio transmission. A common theme in the history of telecommunication is that security follows the communication technology with a delay. In Finland, the first public mobile network, the Autoradiopuhelin (ARP) and the first generation Nordisk MobilTelefon (NMT) system transmitted analog voice signal originally without any cryptographic protection. Yet, the security of the device itself was hardly recognized as a critical problem before the 1990s. The primary reason for that was the strictly regulated operation: both ARP (launched in 1971) and NMT (1981) were operated by governmental organisations. Mobile phones themselves were closed systems with few extra capabilities compared with traditional landline phones: The bulky and heavy equipment were best protected by doors and locks.

The 1990s revolutionized mobile telecommunication. First, the GSM standard addressed the communication security problem by encrypting the signal and protecting the air-interface. Second, the deregulation of telecommunication opened market for private companies which skyrocketed the number of operators within a short time. The physical dimensions of mobile handsets shrank, making them attractive targets for thieves. Finally, after the mid-1990s, support for third-party applications written in the Java programming language that could be downloaded from the Internet transformed phones rapidly from closed devices to open systems that increasingly started to resemble small, general-purpose computers. Resulting from more users, less governmental control over the industry, and more sources of potential vulnerabilities, device security emerged as a new problem in the design of mobile phones.

The security of the phone under these new conditions referred merely to the integrity of the device: Regulators and mobile network operators came up with novel needs to protect certain pieces of information inside the phone from unauthorized changes after the phone left the assembly line. In particular, the regulators wanted a secure storage for the device identity (International Mobile Equipment Identity, IMEI) and for certain parameters such as those for radio frequency transmission, which could affect the safety of the phone and functionality of the mobile network. Mobile operators, which were the principal customers for Nokia, needed a strong subsidy lock mechanism (colloquially known as “SIM locks”) that would tie the phone to a certain operator for a predetermined time.

Security of Nokia’s Digital Core Technology (DCT) generation phones was enforced with mainly software solutions and protected by secrecy within the organization: even security professionals had little more than educated guesses about the structure and requirements of the DCT security architecture. The essential weakness of this kind of “security through obscurity” design is that after the secret designs are revealed, the protection is lost. The high market share of Nokia made it an attractive target for hackers. DCT4 generation brought in hardware component in the form of one-time-programmable memory but particularly in the case of the SIM locks, the economic motives to break the security system outstripped the technological capabilities to protect it. The profit losses of important customers increased pressures to design a better security architecture.

Towards mobile platform security

The interest towards a coherent, hardware-enforced platform security stemmed from a team of engineers working with mobile payments and security. The initial idea was to introduce a separate security chip to implement physical isolation of security-critical processing. Yet, an additional hardware chip was deemed too expensive in the strictly cost-sensitive organization. At the turn of the millennium, a newly graduated engineer came up with an idea to implement a logically isolated security mode using just one chip. A new status bit was introduced in the processor, which determined the status of data stored in memory and whether or not the processor was in a secure mode. This “secure processor environment” design was adopted as the fundamental cornerstone of the next baseband generation: Baseband 5 (BB5).

In the history of technology no technological innovations really emerge out of nowhere but are always influenced by preceding ideas and inventions. Certain features of the secure environment, were decisively invented around already existing patents. What was novel with Nokia’s solution was the combination of the software and hardware features to implement an architecture for mobile platform security which were deployed on a large-scale.

The launch of the first BB5 phone in 2004 was a major landmark in the development of Nokia’s mobile phones as it effectively ushered in the era of 3G phones. Less visible to the public but equally important was the changes introduced to the platform security model as a whole.

First, it marked a switch from “security through obscurity” towards “security through transparency”. Open communication throughout the process, considerable level of transparency of the design, together with a public key infrastructure was required to develop a strong, usable, and cost-efficient security architecture.

Second, security transformed from add-on feature into an integral part of the platform design. At the time when the early smartphones started to increasingly resemble general purpose computers in terms of their capabilities and function, the security engineers opted against following the PC world in the security design: instead of reactive firewalls and anti-virus tools, mobile security was better addressed proactively in the platform design.

Alliance of hardware and software

At the time when the secure processor environment was initiated, Nokia produced its own Radio Application Processor (RAP) chipsets. It simplified the efforts of making the hardware components to accommodate security requirements. Yet, the first BB5 phones were already equipped with Texas Instruments OMAP processors. Nokia and Texas Instruments had a close partnership which involved intense cooperation in chipset design. Later, Texas Instruments branded the technology as M-Shield. M-Shield stemmed from the same origin as Nokia’s secure processor environment but was subsequently developed in a different direction.

Around 2003, ARM proposed to develop a system-wide hardware isolation for secure execution for Nokia. Cooperation between ARM and Texas Instruments had its independent business goals separate from Nokia’s needs but it was also in Nokia’s interests. It provided Nokia with a possibility to implement a secure environment on any chip implementing ARM’s security architecture, which would later become known as ARM TrustZone. (A 2004 article is possibly the first public technical paper describing ARM TrustZone. At present, there is no official website hosting this article, but it appears to be stashed at this unofficial site).

Security as an enabler

A deep paradox in the development of security technology is that security is important to have but difficult to sell. The importance of security becomes apparent only when it does not work and its benefits for the business are rarely manifested by increased sales. In corporate management, security remained overshadowed by competition for customers’ satisfaction and optimization of global supply chains. Instead of a strategic R&D project, the secure processor environment proceeded as a technology-driven skunkworks of a handful of engineers and researchers. The development of security technology outside the strategic spotlights was facilitated by Nokia’s organizational culture that granted technological experts with considerable room to maneuver. Also critical was that the security engineers successfully translated security from a problem into an enabler. E-commerce still remained a marginal use case at the beginning of the 2000s, SIM lock, IMEI protection, and later digital rights management (DRM) became the main business cases that justified the adjustment of the hardware and software architectures.

Once the platform security architecture was accepted for product programs, hardware suppliers had adopted the secure processor environment in their designs, and complementary adjustments in manufacturing process and key distribution services were implemented, the security technology constituted an infrastructure for other applications and functions to take advantage of. These novel uses of the security infrastructure ranged from the rather trivial case of the protection of audio compression attributes of Nokia headphones to the widely influential use of security certificates for distinguishing among model variants during the manufacturing process.

Standardised trust

After the adoption of the hardware-enforced secure execution environment as a de facto internal standard, Nokia turned the attention onto the state-of-the art of mobile security standards in formal standards development organizations. Two prime rationales motivated the representatives of the company to take an active role in international standardization forums. First, an open standard that was revised in an international cooperation community, required no maintenance from Nokia, and was available for potential suppliers would facilitate competitive bidding in chipset production. Second, as the mobile standards were anyway going to take form in the future, Nokia wanted to make sure they would be compatible with the solution it had already adopted.

At first, Nokia’s representatives chaired a mobile working group within Trusted Computing Group (TCG). Although being founded by PC companies, TCG was the only industrial forum working with hardware security standards for global use in the early 2000s. In 2007, TCG announced the first hardware security standard for mobile, Mobile Trusted Platform Module (mobile TPM, MTM), which became an ISO standard. It was different from Nokia’s secure processor environment but more importantly, it was compatible with it.

The concept of TEEs was first described publicly by Open Mobile TerminalPlatform (OMTP) in its specification for Advanced Trusted Environment in 2009. A while later, the center of TEE standardization became another industry forum, GlobalPlatform. With two industrial forums striving to standardize hardware-enforced mobile security, there was a risk that they would end up with mutually incompatible specifications. It was in Nokia’s interest to turn the forums from competitors into cooperators. In 2010, GlobalPlatform published its first TEE standard, TEE client API 1.0 that defines the communication between trusted applications which are executed in TEE, and applications executed by the main operating system. In 2012, GlobalPlatform and TCG announced the founding of a joint working group focusing on security topics.

After 2010, Nokia had less resources for extensive mobile device R&D projects or participation in international forums. Development of TEE technology continued even as Nokia's role in it diminished over time. Today, TEE technology is widely deployed on mobile devices and is extensively used on both iOS and Android smartphones.

Some concluding remarks

History does not repeat itself. Lessons of past failures and success are not readily applicable in the future. Yet, historical insight into the development of mobile TEEs helps us to comprehend comparable systems of today. In particular, despite the convergence between the PC and mobile worlds, the different approaches to platform security architecture still manifest the legacy of the past: The technological paths once taken create dependencies over time and continue shaping the framework in which the security engineers operate today.

In addition, the constitutive role of only a few dedicated security professionals in just one company in the development and establishment of an international standard demonstrates the malleability of technological systems when they are still under construction.

Finally, the concept of “security” resonates with the complex needs of authorities, customers, and users but translates into very different meanings for different stakeholders. Mobile security technology may protect the privacy of the user from corporations, or the business from its customers; it may also ensure the safety of the device or enable the law-enforcement to access personal data in a lawful manner. Security technology is bound to the interaction with its political, cultural, and economic environment and is always shaped by them.

The TEE-history project team:

Saara Matala, project researcher

Thomas Nyman, doctoral candidate

N. Asokan, professor

We interviewed fifteen former or current Nokia employees for this project: two senior executives, three managers, four researchers, and six engineers. We thank them for being generous with their time and insights. Among them are:

Timo Ali-Vehmas, Nokia
Jan-Erik Ekberg, Huawei
Janne Hirvimies, Darkmatter
Antti Jauhiainen, ZoomIN Oy
Antti Kiiveri, Boogie Software Oy
Markku Kylänpää, VTT
Meri Löfman, Brighthouse Intelligence Oy
Janne Mäntylä, Huawei
Yrjö Neuvo, Aalto University
Valtteri Niemi, University of Helsinki
Lauri Paatero, F-Secure
Jukka Parkkinen, OP Financial Group
Janne Takala, Profit Software Oy.
Janne Uusilehto, Google

Wednesday, 5 June 2019

Protecting against run-time attacks with Pointer Authentication

Since the Morris Worm of 1988, buffer overflows and similar have been the source of many remote-code-execution vulnerabilities. They can allow attackers to overwrite pointers in memory and make a vulnerable program jump to unexpected locations. The ARMv8.3-A architecture includes Pointer Authentication, a set of instructions that can be used to cryptographically authenticate pointers and data before they are used. We show several ways that Pointer Authentication can be used to improve security, and prevent attackers from turning programmer errors into remote-code-execution vulnerabilities.

Pointer Authentication: the What, the How and the Why

The fundamental purpose of Pointer Authentication (PA) is to allow software to verify that values read from memory—whether data or pointers—were generated by the same process in the right context. It does this by allowing software to generate a pointer authentication code (PAC), a tweakable MAC that can be squeezed into the unused high-order bits of a pointer, and whose key is stored in a register accessible only by software running at a higher privilege level, such as the operating system kernel.

PA provides three main types of instructions:

Type	Examples	Purpose
Generate an authenticated pointer	`pacia`, `pacibsp`	Generate a short PAC over a pointer and store the PAC into the high-order bits of the pointer.
Verify an authenticated pointer	`autia`, `autibsp`	If the pointer contains a valid PAC for its address, turn it back into a "normal" pointer; otherwise, make the pointer invalid so that the program will crash if the pointer is used.
Generate a "generic" PAC	`pacga`	Generate a 32-bit PAC over the contents of a whole register.

These instructions combine three values:

The value to be authenticated. For all the instructions except pacga, the PAC is computed over the low-order bits that contain the actual pointer data (the high-order bits being reserved for the PAC, a "sign" bit used to determine whether the reserved bits of a verified pointer are set to all zeros or all ones, and an optional address tag).
A modifier value. This is used to determine the "context" of a pointer so that an authenticated pointer can't be taken from one place and reused in another (more on this later). There are some special-case instructions that are hard-coded to use e.g. the stack pointer or zero as the modifier. The modifier is used as the "tweak" for the tweakable MAC computation.
A key. There are five of these, and which one is used depends on the choice of instruction. This is stored in a register that (on Linux) cannot be accessed from user-space and is set to a random value when the process is started so that authenticated pointers aren't interchangeable between processes.

We are primarily interested in the first two types of instructions, which store and verify PACs in the high-order bits of a pointer, as illustrated below. The actual number of bits depends on the configured virtual address size (39 bits on Linux by default) and whether an address tag is present. By default, these PACs are 16 bits long on Linux.

**`A PAC` instruction is used to generate a PAC over an address and store it in high-order bits of the pointer.** The instruction takes an address and modifier as operands, with the PAC key being stored in a separate system register that on Linux is configured to be accessible only by the kernel. There are several families of PAC instructions, each of which uses a different key: `PACIA` and `PACIB` for code pointers, and `PACDA` and `PACDB` for data pointers.

By using these instructions to verify the authenticity of pointers before they are used, we can prevent an attacker from e.g. overwriting return addresses on the stack using a buffer overflow, or overwriting other program values as part of a data-oriented programming attack. When the program returns, it verifies the PAC bits of the return address, causing the program to crash if the return address has been changed. This has been implemented in both GCC and LLVM as the -msign-return-address option.

Much of the difference in security of PA-based protection schemes comes from the choice of modifier. A modifier should be outside the attacker's control, as well as quick and easy to compute from the available data, but if modifiers coincide too often, then this gives an attacker too many opportunities to reuse pointers outside the context that they are meant to be used in.

PARTS: Protecting data pointers with PA

Modern operating systems have many protections against buffer-overflow-type attacks—e.g. W^X and ASLR. W^X prevents an attacker from injecting their own code, and is defeated by return-oriented programming, in which return addresses are overwritten to make the program return to a series of "gadgets", small pieces of code already present in the program that can be assembled into the attacker's desired functionality. This has encouraged the use of control-flow integrity mechanisms such as shadow stacks, but even perfect control flow integrity is not enough. Data-oriented programming attacks piggy-back on the program's correct control flow, performing arbitrary computation by manipulating the program's data.

We have introduced a PA-based scheme, PARTS (Usenix SEC 2019), which protects against data-oriented programming attacks that depend on pointer manipulation, as well as many control-flow attacks. PARTS prevents a pointer that ostensibly points to one type from being dereferenced into an object of a different type. Since the compiler knows all of the types at compile time, it can select modifiers statically that will be used to generate and verify PACs when an address of an object is put and taken from memory, respectively. This can be used to protect against the misuse of both data pointers as well as function pointers. Verifying the PAC of a function pointer before the pointer is used in an indirect call provides protection for the program's forward control-flow as well.
To protect the backward control-flow (i.e. to prevent the program from jumping to return addresses overwritten by the attacker), the return addresses are also authenticated; we discuss this in the following sections.

PA-based return address protection

PA is not only useful for this kind of "static" protection, where modifiers can be chosen at compile time. Dynamically-selected modifiers can be particularly powerful.

One of the first uses of Pointer Authentication was to protect return addresses on the stack, since overwriting a return address makes the program jump to a memory address of the attacker's choice. Including a PAC in the return address will make the jump fail, unless the authenticated return address was previously generated by the program. But this has a problem: if an attacker can use a memory vulnerability to read from the program's memory, then they can obtain authenticated return addresses that will validate correctly, and overwrite the return address with one of these. This is where the modifier comes into play: if the modifier depends on the path that the program has taken through its call-graph, then the authenticated return pointers from different paths cannot be swapped.
One way (and the first proposed use of ARM PA) to make this modifier path-dependent is to use the stack pointer as the modifier. Each time a function is called, the stack pointer is reduced in order to make space for stack variables, saved registers, and the return address. Since the value subtracted from the stack pointer depends on the function that has been called, this results in a modifier that depends on the stack layout of a particular path through the program. This approach is illustrated below.

However, the resulting modifier can be predicted from static analysis of the program, so an attacker can find distinct paths through the program that lead to identical stack pointers, allowing their corresponding return addresses to be exchanged.

PACStack: an authenticated call stack

To overcome this problem, we have developed an alternative technique called PACStack (poster at DAC 2019, full report on arXiv), which uses chained PACs to bind the current return address to the entire path taken through the call graph.

Apart from PA, the key feature of the ARM architecture that makes PACStack possible is that after a call and before a return, the return address of the current function is stored in a register, called the Link Register (LR). By ensuring that the current return address is always stored in a register, we prevent an attacker exploiting a buffer overflow from ever overwriting the current (authenticated) return address, but only the next one, which will be loaded into LR when the function returns. By verifying the PAC of the return address kept in a register (and therefore known to be good) using the previous authenticated return address as a modifier implicitly verifies the authenticated return address being loaded from the stack.

Since the new value of LR also contains a PAC, authenticating the head of this chain of PACs recursively authenticates the entire call stack, providing perfect backward control-flow integrity.

**The chain of PACs produced by PACStack.** Each PAC is generated using the previous authenticated return address as modifier.

For the attacker, this cryptographic protection means that returning to a different address is now far more difficult than just overwriting a return address, as seen below.

**Anatomy of a control-flow violation with PACStack in use.** In the correct control flow (left), after a call from A to C, C returns back to A. The goal of the attacker is to return from C back to some other function B. To do this, they must replace C's return value by overwriting it on the stack when C calls some other function ("loader" in the diagram). This new value must pass two PAC-checks before the program will return to the new pointer.

A major issue with this type of scheme is that because the PACs are short—16 bits in this case—and the attacker can use their ability to overwrite variables in memory to influence the path through the call graph, the attacker can take advantage of the birthday paradox, guiding the program along many different paths through the call-graph and obtaining colliding PACs after around 320 attempts on average. This allows the attacker to call down through the program's call graph and return up a different path, as illustrated in the figure above.

Not content with this, we have developed a technique that we refer to as PAC masking, which prevents the attacker from exploiting PAC collisions. PACStack makes a second use of the PA instructions as a pseudo-random generator, which is used to generate a modifier-dependent masking value that is XOR-ed with the PAC. This prevents the attacker from recognizing when two PACs generated with different return values collide, forcing them to risk a guess. The result is that no matter how many authenticated return pointers the attacker is able to obtain, they cannot successfully change a function's return address with probability better than one in 65536, making return-to-libc, return-oriented programming, and similar types of attacks extremely unlikely to succeed.

Next steps

The specification for Pointer Authentication leaves it to the implementer to decide how the PAC is actually implemented. One possibility here is for an implementer to use something like our PAC masking primitive for all of the non-generic PACs, but it is not yet clear whether further security requirements will become apparent in the future.

Together, these examples show the great flexibility of Pointer Authentication in ARMv8.3-A. The cost of this flexibility is the need for thorough cryptographic analysis. However, our experience with PACStack shows that this is viable for practical systems. This flexibility is what makes Pointer Authentication especially exciting as a run-time security feature, enabling compiler-writers to integrate highly secure run-time protection mechanisms without waiting for hardware to catch up. As more of these enabling features—e.g. memory tagging and branch target indicators—are deployed in the coming years, new defenses will become possible, and run-time protection schemes will continue to become faster and more secure.

Friday, 4 January 2019

How to evade hate speech filters with "love"

It has often been suggested that text classification and machine learning techniques can be used to detect hate speech. Such tools could then be used for automatic content filtering, and perhaps even law enforcement. However, we show that existing datasets are too domain-specific, and the resulting models are easy to circumvent by applying simple automatic changes to the text.

The problem of hate speech and its detection

Hate speech is a major problem online, and a number of machine learning solutions have been suggested to combat it.

The vast majority of online material consists of natural language text. Along with its undeniable benefits, this is not without negative side-effects. Hate speech is rampant in discussion forums and blogs, and can have real world consequences. While individual websites can have specific filters to suit their own needs, the general task of hate speech detection involves many unanswered questions. For example, it is problematic where we should draw the line between hateful and merely offensive material.

Despite this, multiple studies have claimed success at detecting hate speech using state-of-the-art natural language processing (NLP) and machine learning (ML) methods. All these studies involve supervised learning, where a ML model is first trained with data labeled by humans, and then tested on new data which it has not seen during training. The more accurately the trained model manages to classify unseen data, the better it is considered.

While a number of labeled hate speech datasets exist, most of them are relatively small, containing just some thousands of hate speech examples. So far, the largest dataset has been drawn from Wikipedia edit comments, and contains around 13 000 hateful sentences. In the world of ML, these are still small numbers. Other datasets are typically taken from Twitter. They may also differ in the type of hate speech they focus on, such as racism, sexism, or personal attacks.

Google has also developed its own “toxic speech” detection system, called Perspective. While the training data and model architecture are unavailable to the public, Google provides black-box access via an online UI.

We wanted to compare existing solutions with one another, and test how resistant they are against possible attacks. Both issues are crucial for determining how well suggested models could actually fare in the real world.

Applying proposed classifiers to multiple datasets

Datasets are too small and specific for models to scale beyond their own training domain.

Most prior studies have not compared their approach with alternatives. In contrast, we wanted to test how all state-of-the-art models perform on all state-of-the-art datasets. We gathered up five datasets and five model architectures, seven combinations of which had been presented in resent academic research.

The model architectures differed in input features and the ML-algorithm. They either looked at characters and character sequences, or alternatively at entire words and/or word sequences. Some models used simple probabilistic ML algorithms (such as logistic regression or a multilayered perceptron network), while others used state-of-the-art deep neural networks (DNNs). More details can be found in our paper.

To begin with, we were interested in understanding how ML-algorithms differ in performance when trained and tested on the same type of data. This would give us some clue as to what kinds of properties might be most relevant for classifying something as hate speech. For example, if character frequencies were sufficient, simple character-based models would suffice. In contrast, if complex word-relations are needed, recurrent neural network DNN-models (like LSTMs or CNNs) would fare better.

We took all four two-class model architectures, and trained them on all four two-class datasets, yielding eight models in total. Next, we took the test sets from each dataset, and applied the models to those. The test sets were always distinct from the training set, but derived from the same dataset. We show the results in the two figures below (datasets used in the original studies are written in bold.)

Performance of ML-algorithms on different datasets (F1-score)

Performance of models with different test sets (F1-score)

Our results were surprising on two fronts. First, all models trained on the same dataset performed similarily. In particular, there was no major difference between using a simple probabilistic classifier (logistic regression: LR) that looked at character sequences, or using a complex deep neural network (DNN) that looked at long word sequences.

Second, no model performed well outside of its training domain: models performed well only on the test set that was taken from the same dataset type that they were trained on.

Our results indicate that training data was more important in determining performance than model architecture. We take the main relevance of this finding to be that focus should be on collecting and labeling better datasets, not only on refining the details of the learning algorithms. Without proper training data, even the most sophisticated algorithms can do very little.

Attacking the classifiers

Hate speech classifiers can be fooled by simple automatic text transformation methods.

Hate speech can be considered an adversarial setting, with the speakers attacking the people their text targets. However, if automatic measures are used for filtering, these classifiers might also be attack targets. In particular, these could be circumvented by text transformation. We wanted to find out whether this is feasible in practice.

Text transformation involves changing words and/or characters in the text with the intent of altering some property while retaining everything else. For evading hate speech detection, this property is the classification the detector makes.

We experimented with three transformation types, and two variants of each. The types were:

word-internal changes:

typos (I htae you)

leetspeak (1 h4te y0u)
word boundary changes:

adding whitespace (I ha te you)

deleting whitespace (Ihateyou)
word appending:

random words (I hate you dog cat...)

"non-hateful" words (I hate you good nice...)

Details of all our experiments are in our paper. Here, we summarize three main results.

Character-based models were significantly more robust against word-internal and word boundary transformations.
Deleting whitespace completely broke all word-based models, regardless of their complexity.
Word appending systematically hindered the performance of all models. Further, while random words did not fare as well as “non-hateful” words from the training set, the difference was relatively minor. This indicates that the word appending attack is feasible even in a black-box setting.

The first two results are caused by a fundamental problem with all word-based NLP methods: the model must recognize the words based on which it does all further computation. If word recognition fails, so does everything else. Deleting all whitespaces makes the entire sentence look like a single unknown word, and the model can do nothing useful with that. Character-models, in contrast, retain the majority of original features, and thus take less damage.

Word appending attacks take advantage of the fact that all suggested approaches treat the detection task as classification. The classifier makes a probabilistic decision of whether the text is more dominantly hateful or non-hateful, and simply including irrelevant non-hateful material will bias this decision to the latter side. This is obviously not what we want from a hate speech classifier, as the status of hate speech should not be affected by such additions. Crucially, this problem will not go away simply by using more complex model architectures; it is built into the very nature of probabilistic classification. Avoiding it requires re-thinking the problem from a novel perspective, perhaps by re-conceptualizing hate speech detection as anomaly detection rather than classification.

The "love" attack

Deleting whitespaces and adding "love" broke all word-based classifiers.

Based on our experimental results, we devised an attack that uses two of our most effective transformation techniques: whitespace deletion and word appending. Here, instead of adding multiple innocuous words, we only add one: “love”. This attack completely broke all word-based models we tested, as well as severely hindered character-based models. However, character-models were much more resistant against it, for reasons we discussed above. (Take a look at our paper for more details on models and datasets.)

The "love" attack applied to seven hate speech classifiers

We also applied the “love” attack to Google Perspective, with comparable results on all example sentences. This indicates that the model is word-based. To improve readability, the attacker can use alternative methods of indicating word boundaries, such as CamelFont.

Example of the "love" attack applied to Google Perspective

Conclusions: can the deficiencies be remedied?

Our study demonstrates that proposed solutions for hate speech detection are unsatisfactory in two ways. Here, we consider some possibilities for alleviating the situation with respect to both.

First, classifiers are too specific for particular text domains and hence do not scale across different datasets. The main reason behind this problem is the lack of sufficient training data. In contrast, model architecture had little to no effect on classification success. Alleviate this problem requires focusing further resources on the hard work of manually collecting and labeling more data.

Second, existing classifiers are vulnerable to text transformation attacks that can easily be applied automatically in a black box setting. Some of these attacks can be partially mitigated by pre-processing measures, such as automatic spell-checking. Others, however, are more difficult to detect. This is particularly true of word appending, as it is very hard to evaluate whether some word is "relevant" or simply added to the text for deceptive purposes. Like we mentioned above, avoiding the word appending attack ultimately requires re-thinking the detection process as something other than probabilistic classification.

Character-models have a far superior resistance to word boundary changes than word-models. This is as expected, since most character sequences are still retained even if word identities are destroyed. As character-models performed equally well to word-models in our comparative tests, we recommend using them to make hate speech detection more resistant to attacks.

Secure Systems