Mix-testing: revealing a new class of compiler bugs

I’m delighted that Luke Geeson’s work on “mix testing” (a collaboration with James Brotherston, Wilco Dijkstra, Alastair Donaldson, Lee Smith, Tyler Sorensen, and myself) will appear at OOPSLA 2024 in November. 

There has been quite a lot of work on “litmus testing” of compilers in the last couple of decades. This mainly takes the form of (1) generating lots of little concurrent programs, (2) feeding them into a compiler, and then (3) seeing if the compiled code does the right thing when executed on the target machine. Luke’s idea is: what if we don’t give the whole litmus test to a single compiler, but instead we break it up into little fragments, and have each fragment compiled by a different compiler. Thus, we have the potential to find bugs that arise out of unexpected interactions between compilers.

Here’s a simple example of how a “mixing bug” might arise when targeting the x86 architecture.

Consider the following “store buffering” litmus test, written in C:

Two threads write and read to two shared locations, x and y. The C standard requires that after executing these two threads, the values in t and u must not both be zero (in other words: at least one of the threads must see the other thread’s write).

However, if this program is compiled naïvely to x86, that requirement will not be met, because when x86 processors encounter a write instruction followed by a read of a different location, they do not promise to keep them in order. Therefore, the compiler must insert a barrier between them, to prevent the potential reordering.

There are two plausible ways to accomplish this. One is for the compiler to insert a barrier after each write. That would yield something like this:

The other option is to insert a barrier before each read. That would yield something like this:

The first scheme is more popular because it tends to result in fewer barriers, given that writes are generally less common than reads. It is the scheme used by the major compilers, GCC and Clang. But either scheme is fine … as long as it is used consistently. Problems arise if the two schemes are mixed. For instance, if the first instructions in both threads are compiled using the barriers-before-reads scheme, and the second instructions in both threads are compiled using the barriers-after-writes scheme, then both compilers will wrongly assume that the other compiler is responsible for keeping the instructions in order, and no barrier will be inserted at all!

That example serves to illustrate what mixing bugs look like, but it’s not really a bug because it is already well understood that the barriers-before-reads and barriers-after-writes compilation schemes must not be mixed.

So, here is a real mixing bug. It’s a little more complicated, but the basic idea is the same: two different compilation schemes interacting in such a way that fails to enforce some necessary ordering.

Consider the same “store buffering” litmus test again, but this time compiled for Arm. Here’s how it is compiled by Clang for the Armv7 architecture:

I’ll skip the gory details of the assembly code. What’s important is how the compiler ensures that the stores and loads don’t get reordered. In this case, the compiler has inserted a barrier (DMB) at the end of all four instruction sequences. In particular, the barriers marked with the red rectangles are the ones that are important here.

Here’s how that same program is compiled by Clang for the Armv8 architecture:

This mapping is rather more efficient because it has avoided all those (expensive) barriers, and instead exploited some features that only became available in Armv8. Specifically, store-release instructions (STL) cannot be reordered with subsequent load-acquire instructions (LDA). In particular, this means that the stores and loads marked with the blue puzzle pieces cannot be reordered. I use a puzzle-piece metaphor here to indicate that these instructions only enforce ordering when they appear together. Neither instruction acts as a barrier on its own.

Now suppose we compile the first instruction in each thread for Armv8, but the second instruction for Armv7. This is perfectly legal, by the way – the resulting code should run fine on an Armv8 machine because Armv8 is fully backward-compatible with Armv7. We get something like this:

We see that neither the Armv7 nor the Armv8 mechanism for inserting a barrier is working. The Armv7 scheme relies on a barrier being inserted after the store, but the Armv8 scheme does not provide this barrier! Instead, the Armv8 scheme uses a store-release, but this is ineffective because the Armv7 scheme does not produce the corresponding load-acquire! In short, nothing is preventing the stores and loads being reordered, and we have a compiler bug.

Luke has built a tool that explores lots of possible compiler combinations for each litmus test, and he has used it to find several bugs like the one above. (That bug remains open at the time of writing, by the way.) Luke is now working with his colleagues at Arm to make their ABI more explicit about how and when various compilation schemes can be safely mixed. Hopefully, this effort will provide firmer foundations for mixed compilation in the future.

This work was supported by the Engineering and Physical Sciences Research Council (grant number EP/V519625/1). The views expressed in this blog post are not endorsed by Arm or any other company mentioned.

1 thought on “Mix-testing: revealing a new class of compiler bugs”

Leave a comment