Volatile Variables in C and C++

Page Contents

References

  1. Volatiles Are Miscompiled, and What to Do about It, Eric Eide et al.
  2. Is A Global Implicitly Volatile In C, Stackoverflow.com.
  3. Compilers - What Every Programmer Should Know About Compiler Optimizations, Hadi Brais, Feb 2015.
  4. Compilers - What Every Programmer Should Know About Compiler Optimizations, Part 2, Hadi Brais, May 2015.
  5. A Guide to Undefined Behavior in C and C++, Part 3, John Regehr.
  6. When is a Volatile Object Accessed?, GCC manual.
  7. Memory Ordering at Compile Time, Jun 25 2012.
  8. The Joys of Compiler and Processor Reordering, Microsoft Blog, March 2008.
  9. Instruction scheduling.
  10. The Trouble With Volatile, May 2007, LWN.

Types

The C standard has this to say about the voltile keyword.

An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects.

-- C99 Standard

So what? Why does the compiler care? The reason is that the compiler is free to re-organise and change your code during optimization as long as the visible result is the same.

The Compiler Design Handbook by Y.N. Srikant et al has this to say about compiler optimisations.

Ever since the advent of reduced instruction set computers ... instruction scheduling techniques have gained importance as they rearrange instructions to "cover" the delay or latency that is required between an instruction and its dependent successor. Without such reordering, pipelines would stall, resulting in wasted processor cycles...

... Instruction scheduling methods for basic blocks may result in a moderate imporovement (less that 5 to 10%) in performance, in terms of execution time of a schedule, for simple pipelined RISC architectires. However, the performance improvement achieved for multiple instruction issue processors could be significant...

Instruction scheduling is typically performed after machine-indpendent optimizations ... on the target machine's assembly code...

-- The Compiler Design Handbook by Y.N. Srikant et al

It is important to note that volatile only stops the compiler optimising lines that access volatile objects. It does not imply that the object is in non-cacheable memory, or that caches are invalidated before it is read or anything like this!

It should also be noted that whilst a volatile access will not moved w.r.t. to other volatile accesses, non-volatile accesses can be re-ordered around them.

When/How The Compiler Optimizes

Using the super amazing Godbold compiler explorer, compiling using GCC for arm at opimisation level 3, we can explore how volatile works on one of the most simple optimisations.

The Compiler Can Assume A Variable Stays Constant

In the following example the compiler can see that the variable a is not modified inside the while loop. It assumes a single flow of execution and so can see that in the while loop the expression a == 1 will always evaluate to the same boolean value within the loop. Thus, to save compatuational time, it does not need to recalculate this expression and worse, have to branch, for each iteration of the loop. It can just to this once before the loop runs and then either execute an empty loop or return.

int a = 1;

void f(void)
{
    while(1)
    {
        if (a == 1) break;
    }
}
f:
        adrp    x0, .LANCHOR0
        ldr     w0, [x0, #:lo12:.LANCHOR0]
        cmp     w0, 1
        beq     .L1
.L3:
        b       .L3
.L1:
        ret
a:
        .word   1
# The assembler equivalent is this:
int a = 1;

void f(void)
{
    if (a == 1) goto L1;
    while(1)
    {
    }
L1:
}

You can see the above code on Godbolt.

Using Volatile To Remove Compiler's Ability To Assume Constantness

By marking the global variable from the previous example as volatile the compiler can not assume anything about the state of the variable w.r.t. the last statement executed. Thus, it is not free to move the varialble outside of the loop as on each evaulation it can no longer assume that the value is the same.

volatile int a = 1;

void f(void)
{
    while(1)
    {
        if (a == 1) break;
    }
}
f:
        adrp    x1, .LANCHOR0
        add     x1, x1, :lo12:.LANCHOR0
.L2:
        ldr     w0, [x1]
        cmp     w0, 1
        bne     .L2
        ret
a:
        .word   1

You can see the above code on Godbolt.

Using A Memory Barrier To Remove Compiler's Ability To Assume Constantness

This one was brought to my attention when I was reading up about the view that the Linux kernel community takes to the volatile keyword within the kernel and in another situation, using volatile to share data between threads. The same effect can be produced, in the above example, by using a "memory barrier" that forces the compiler to assume that registers are "dirty" and that objects must be reloaded from memory. Doing so means that the object must be reloaded on each loop iteration, so again, the compiler cannot hoist it outof the loop.

Why might we want to do this?

int a = 1;

void f(void)
{
    while(1)
    {
        asm volatile("": : :"memory");
        if (a == 1) break;
    }
}
f:
        adrp    x1, .LANCHOR0
        add     x1, x1, :lo12:.LANCHOR0
.L2:
        ldr     w0, [x1]
        cmp     w0, 1
        bne     .L2
        ret
a:
        .word   1

You can see the above code on Godbolt.

Using A Function Call To Remove Compiler's Ability To Assume Constantness

By placing a function call, to a function in a different translation unit (the compiler can't see into unless cross module optimisations are being done) before the evaluation of the conditional, it can not assume that the state of a has not been modified as a side effect of the function call.

int a = 1;

extern void something(void);

void f(void)
{
    while(1)
    {
        something();
        if (a == 1) break;
    }
}
f:
        stp     x29, x30, [sp, -32]!
        mov     x29, sp
        str     x19, [sp, 16]
        adrp    x19, .LANCHOR0
        add     x19, x19, :lo12:.LANCHOR0
.L2:
        bl      something
        ldr     w0, [x19]
        cmp     w0, 1
        bne     .L2
        ldr     x19, [sp, 16]
        ldp     x29, x30, [sp], 32
        ret
a:
        .word   1

You can see the above code on Godbolt.

Presumably, if global (cross-compilation-unit) optimisation is turned on this wouldn't necessarily work and the either the called function or the callee would also have to use a memory barrier.