Wednesday 5 June 2019

Protecting against run-time attacks with Pointer Authentication

Since the Morris Worm of 1988, buffer overflows and similar have been the source of many remote-code-execution vulnerabilities. They can allow attackers to overwrite pointers in memory and make a vulnerable program jump to unexpected locations.  The ARMv8.3-A architecture includes Pointer Authentication, a set of instructions that can be used to cryptographically authenticate pointers and data before they are used.  We show several ways that Pointer Authentication can be used to improve security, and prevent attackers from turning programmer errors into remote-code-execution vulnerabilities.

Pointer Authentication: the What, the How and the Why

The fundamental purpose of Pointer Authentication (PA) is to allow software to verify that values read from memory—whether data or pointers—were generated by the same process in the right context. It does this by allowing software to generate a pointer authentication code (PAC), a tweakable MAC that can be squeezed into the unused high-order bits of a pointer, and whose key is stored in a register accessible only by software running at a higher privilege level, such as the operating system kernel.

PA provides three main types of instructions:
Generate an authenticated pointer pacia, pacibsp Generate a short PAC over a pointer and store the PAC into the high-order bits of the pointer.
Verify an authenticated pointer autia, autibsp If the pointer contains a valid PAC for its address, turn it back into a "normal" pointer; otherwise, make the pointer invalid so that the program will crash if the pointer is used.
Generate a "generic" PAC pacga Generate a 32-bit PAC over the contents of a whole register.

These instructions combine three values:
  1. The value to be authenticated. For all the instructions except pacga, the PAC is computed over the low-order bits that contain the actual pointer data (the high-order bits being reserved for the PAC, a "sign" bit used to determine whether the reserved bits of a verified pointer are set to all zeros or all ones, and an optional address tag).
  2. A modifier value. This is used to determine the "context" of a pointer so that an authenticated pointer can't be taken from one place and reused in another (more on this later). There are some special-case instructions that are hard-coded to use e.g. the stack pointer or zero as the modifier. The modifier is used as the "tweak" for the tweakable MAC computation.
  3. A key. There are five of these, and which one is used depends on the choice of instruction. This is stored in a register that (on Linux) cannot be accessed from user-space and is set to a random value when the process is started so that authenticated pointers aren't interchangeable between processes.

We are primarily interested in the first two types of instructions, which store and verify PACs in the high-order bits of a pointer, as illustrated below. The actual number of bits depends on the configured virtual address size (39 bits on Linux by default) and whether an address tag is present. By default, these PACs are 16 bits long on Linux.

PAC instruction is used to generate a PAC over an address and store it in high-order bits of the pointer. The instruction takes an address and modifier as operands, with the PAC key being stored in a separate system register that on Linux is configured to be accessible only by the kernel. There are several families of PAC instructions, each of which uses a different key: PACIA and PACIB for code pointers, and PACDA and PACDB for data pointers.

By using these instructions to verify the authenticity of pointers before they are used, we can prevent an attacker from e.g. overwriting return addresses on the stack using a buffer overflow, or overwriting other program values as part of a data-oriented programming attack. When the program returns, it verifies the PAC bits of the return address, causing the program to crash if the return address has been changed. This has been implemented in both GCC and LLVM as the -msign-return-address option.

Much of the difference in security of PA-based protection schemes comes from the choice of modifier. A modifier should be outside the attacker's control, as well as quick and easy to compute from the available data, but if modifiers coincide too often, then this gives an attacker too many opportunities to reuse pointers outside the context that they are meant to be used in.

PARTS: Protecting data pointers with PA

Modern operating systems have many protections against buffer-overflow-type attacks—e.g. W^X and ASLR. W^X prevents an attacker from injecting their own code, and is defeated by return-oriented programming, in which return addresses are overwritten to make the program return to a series of "gadgets", small pieces of code already present in the program that can be assembled into the attacker's desired functionality. This has encouraged the use of control-flow integrity mechanisms such as shadow stacks, but even perfect control flow integrity is not enough. Data-oriented programming attacks piggy-back on the program's correct control flow, performing arbitrary computation by manipulating the program's data.

We have introduced a PA-based scheme, PARTS (Usenix SEC 2019), which protects against data-oriented programming attacks that depend on pointer manipulation, as well as many control-flow attacks. PARTS prevents a pointer that ostensibly points to one type from being dereferenced into an object of a different type. Since the compiler knows all of the types at compile time, it can select modifiers statically that will be used to generate and verify PACs when an address of an object is put and taken from memory, respectively. This can be used to protect against the misuse of both data pointers as well as function pointers. Verifying the PAC of a function pointer before the pointer is used in an indirect call provides protection for the program's forward control-flow as well.
To protect the backward control-flow (i.e. to prevent the program from jumping to return addresses overwritten by the attacker), the return addresses are also authenticated; we discuss this in the following sections.

PA-based return address protection

PA is not only useful for this kind of "static" protection, where modifiers can be chosen at compile time. Dynamically-selected modifiers can be particularly powerful.

One of the first uses of Pointer Authentication was to protect return addresses on the stack, since overwriting a return address makes the program jump to a memory address of the attacker's choice. Including a PAC in the return address will make the jump fail, unless the authenticated return address was previously generated by the program. But this has a problem: if an attacker can use a memory vulnerability to read from the program's memory, then they can obtain authenticated return addresses that will validate correctly, and overwrite the return address with one of these. This is where the modifier comes into play: if the modifier depends on the path that the program has taken through its call-graph, then the authenticated return pointers from different paths cannot be swapped.
One way (and the first proposed use of ARM PA) to make this modifier path-dependent is to use the stack pointer as the modifier. Each time a function is called, the stack pointer is reduced in order to make space for stack variables, saved registers, and the return address. Since the value subtracted from the stack pointer depends on the function that has been called, this results in a modifier that depends on the stack layout of a particular path through the program. This approach is illustrated below.

PA-based return-address protection. At the beginning of each function, a PAC is generated for the current return address, which can then be saved on the stack. At the end of the function, the PAC is verified to ensure that the address has not been tampered with.

However, the resulting modifier can be predicted from static analysis of the program, so an attacker can find distinct paths through the program that lead to identical stack pointers, allowing their corresponding return addresses to be exchanged.

PACStack: an authenticated call stack

To overcome this problem, we have developed an alternative technique called PACStack (poster at DAC 2019, full report on arXiv), which uses chained PACs to bind the current return address to the entire path taken through the call graph.

Apart from PA, the key feature of the ARM architecture that makes PACStack possible is that after a call and before a return, the return address of the current function is stored in a register, called the Link Register (LR). By ensuring that the current return address is always stored in a register, we prevent an attacker exploiting a buffer overflow from ever overwriting the current (authenticated) return address, but only the next one, which will be loaded into LR when the function returns. By verifying the PAC of the return address kept in a register (and therefore known to be good) using the previous authenticated return address as a modifier implicitly verifies the authenticated return address being loaded from the stack.

Since the new value of LR also contains a PAC, authenticating the head of this chain of PACs recursively authenticates the entire call stack, providing perfect backward control-flow integrity.

The chain of PACs produced by PACStack. Each PAC is generated using the previous authenticated return address as modifier.

For the attacker, this cryptographic protection means that returning to a different address is now far more difficult than just overwriting a return address, as seen below.

Anatomy of a control-flow violation with PACStack in use. In the correct control flow (left), after a call from A to C, C returns back to A. The goal of the attacker is to return from C back to some other function B. To do this, they must replace C's return value by overwriting it on the stack when C calls some other function ("loader" in the diagram). This new value must pass two PAC-checks before the program will return to the new pointer.

A major issue with this type of scheme is that because the PACs are short—16 bits in this case—and the attacker can use their ability to overwrite variables in memory to influence the path through the call graph, the attacker can take advantage of the birthday paradox, guiding the program along many different paths through the call-graph and obtaining colliding PACs after around 320 attempts on average. This allows the attacker to call down through the program's call graph and return up a different path, as illustrated in the figure above.

Not content with this, we have developed a technique that we refer to as PAC masking, which prevents the attacker from exploiting PAC collisions. PACStack makes a second use of the PA instructions as a pseudo-random generator, which is used to generate a modifier-dependent masking value that is XOR-ed with the PAC. This prevents the attacker from recognizing when two PACs generated with different return values collide, forcing them to risk a guess. The result is that no matter how many authenticated return pointers the attacker is able to obtain, they cannot successfully change a function's return address with probability better than one in 65536, making return-to-libc, return-oriented programming, and similar types of attacks extremely unlikely to succeed.

Next steps

The specification for Pointer Authentication leaves it to the implementer to decide how the PAC is actually implemented. One possibility here is for an implementer to use something like our PAC masking primitive for all of the non-generic PACs, but it is not yet clear whether further security requirements will become apparent in the future.

Together, these examples show the great flexibility of Pointer Authentication in ARMv8.3-A. The cost of this flexibility is the need for thorough cryptographic analysis. However, our experience with PACStack shows that this is viable for practical systems. This flexibility is what makes Pointer Authentication especially exciting as a run-time security feature, enabling compiler-writers to integrate highly secure run-time protection mechanisms without waiting for hardware to catch up. As more of these enabling features—e.g. memory tagging and branch target indicators—are deployed in the coming years, new defenses will become possible, and run-time protection schemes will continue to become faster and more secure.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.

Unintended Interactions among ML Defenses and Risks

A significant amount of work has been done in understanding various individual security/privacy risks in machine learning models. However, m...