Announcing the launch of CHERI Alliance: A unified front against digital threats

Blog

Addressing memory safety with software

In the previous post we looked at the underlying causes of memory unsafety which was using unsafe programming languages. In particular, C and C++ have been widely used for performance but at the same time carry risks of errors that cause unsafe memory accesses. Let’s now look at software approaches to mitigating memory unsafety.

Software mitigations

A variety of software mitigations have been proposed such as those listed by Saar Amar. The methods vary in granularity, determinism and implementation overhead.

Stack canaries

A well-known example of mitigating memory unsafety with software is the stack canary. If a program is executing and calls a subroutine, the address of the next instruction in the program is pushed onto the call stack and used as a return pointer (RP). When the subroutine has finished execution the return address is popped from the call stack.

In the event of a buffer overflow, the RP will be overwritten. The RP could end up with corrupt content or in the event of an attack could point to malware. A stack canary – analogous to using canaries to detect gas in mines – is a known set of values placed between the stack and the RP. These values are used to detect a buffer overflow.

Simplified stack canary
Figure 1: Simplified stack canary

The compiler will define the values to be stored in the stack canary and which will normally be randomized. After running the subroutine if the values in the stack canary do not match the expected ones that means that the stack canary has been corrupted. The return will not be executed, and the program will terminate. The canary comes with a memory overhead, but this overhead is relatively small.

Stack canaries are statistical meaning that a hacker could exhaustively go through possible values in order to attack a stack. In a recent Zero Day Initiative (ZDI) meeting in Toronto there was a competition to investigate the security of commonly used devices. A team compromised the Netgear Nighthawk RAX30 by using a buffer overflow attack. Although the router used a stack canary, they found a way to “logically” bypass the canary and compromise the router and potentially other connected devices. Stack canaries are an example of a coarse-grained software approach.

Compiler instrumentation

Other fine-grainedmitigations have been developed to ensure memory safety. For example, Apple uses Firebloom compiler instrumentation. Pointers carry metadata to show the exact bounds of a buffer and there are checks on the type of memory allocation. This approach is deterministic and therefore should catch all buffer overflows and over-reads.

The downside is that Firebloom comes with a heavy cost:

  • The pointers take four times more memory than without instrumentation.
  • Code size increases
  • Performance decreases due to more instructions being executed.

These costs are significant and have meant that so far Apple has only used Firebloom in its second stage bootloader iBoot. Using it in a less contained environment such as an OS kernel would be prohibitively expensive.

Safe programming languages

We started this blog by saying that limitations in C, C++ and assembly code lead to memory unsafety. Not all programming languages provide the freedom and the dangers that C and C++ allow. For example, automatic memory management was supported in the Lisp programming language in 1959! It was the first implementation of garbage collection which dealt with some memory safety issues.

Garbage collection is available with more modern languages such as Java and C#. However, garbage collection requires a heavy overhead in computer resources. One study estimated that 5x memory was needed to provide the same performance with garbage collection as running without garbage collection.

Recently a number of memory safe languages have emerged such as multi-threaded Rust and Swift. Such languages are good candidates for new code because of their safe properties. However, does this mean that legacy C or C++ code can be simply abandoned and rewritten?

Most software development is not from scratch but involves integrating existing libraries with new code. Many of the libraries will be written in C or C++ whether in-house, from 3rd party software providers or open source including RTOS and Linux. Therefore, a wholesale transition to memory safe languages is unlikely to be cost effective in the foreseeable future.

A second issue is that memory safe languages often call routines such as drivers written in memory unsafe languages. Unless every part of the code base is written in a memory safe language, there will be local vulnerabilities. A bug allowing out of bounds access in just one routine in a C library called by a program written in a memory safe language can undermine the memory safety of the entire program.

In the next post we will consider the third approach of using hardware methods to enhance memory safety using fine-grained memory protection.

Other blog posts