cppreference.com gets acquire/release instructions wrong

It is quite well-known that the memory operations and fences provided by the C/C++11 languages are pretty complicated and confusing. For instance, my OOPSLA 2015 paper (joint with Batty, Beckmann, and Donaldson) explains that a proposed implementation of these instructions for next-generation AMD graphics cards is faulty because the designers misunderstood how they worked. And back in 2011, Batty et al. showed that (an early draft of) the C++11 language failed to enforce a desirable property called ‘sequential consistency for data-race-free programs’, as a result of a subtlety in the way some of these memory operations were being defined.

Today I found another example of the confusing arising from memory operations and fences in C/C++11. The cppreference.com website is a long-established, well-regarded, and widely-used reference for C and C++ programmers, but its description of how the “acquire/release” memory operations and fences work is quite wrong.

The website claims that acquire loads cannot be reordered with subsequent reads:

and that release stores cannot be reordered with preceding writes. It also claims that a store that follows a release fence cannot be reordered with any write that precedes the fence:

This is incorrect (or more precisely, it is not the whole truth). The truth, according to the official C/C++11 language specifications, is that

acquire loads cannot be reordered with any subsequent memory accesses (not just reads), and
release stores cannot be reordered with any preceding memory accesses (not just writes).

Similarly,

release fences prevent stores that follow the fence being reordered with any memory access that precedes the fence (not just writes), and
acquire fences prevent loads that precede the fence being reordered with any memory access that follows the fence.

Here is an example that demonstrates the different interpretations of acquire loads. Suppose we have two threads:

int main() {
  atomic_int x=0; atomic_int y=0;
  {{{ {
    // Thread 1:
    int r0 = atomic_load_explicit(&x, memory_order_relaxed); //a
    atomic_store_explicit(&y, 1, memory_order_release);      //b
  } ||| {
    // Thread 2:
    int r1 = atomic_load_explicit(&y, memory_order_acquire); //c
    atomic_store_explicit(&x, 1, memory_order_relaxed);      //d
  } }}}
}

Can we obtain r0 = 1 and r1 = 1 at the end? According to the official standard, the answer is no. This can be confirmed by copying the code above into the CppMem tool (which enumerates all the behaviours of a small C/C++ program that are allowed by the standard).

But if we follow the guidelines from cppreference.com and allow the acquire load (c) to be reordered with the subsequent relaxed store (d), then it is straightforward to obtain r0 = r1 = 1, simply by executing the instructions in the following order: d, a, b, c.

According to cppreference.com, acquire/release instructions are less powerful than they are in the official standard. This is problematic for programmers, because they might end up using stronger instructions than they really need, which may make their programs less efficient. It’s even more problematic for compiler-writers, because if they just follow the interpretation on cppreference.com, they might end up providing acquire/release instructions that are weaker than programmers expect, which means that correctly-written programs could give the wrong results when they are executed.

1 thought on “cppreference.com gets acquire/release instructions wrong”

[…] This excellent post goes to show how easy it is to get it wrong (or miss the subtleties) when dealing with memory ordering. (The omissions in cppreference.com has since been corrected.) If you think multi-threaded programming is hard, wait until you try to verify the correctness of a lock-free program, even a simple ring buffer. […]

Share this:

1 thought on “cppreference.com gets acquire/release instructions wrong”

Leave a comment Cancel reply