Bit Banding in the STM32

By | July 14, 2010

Wondrous though the STM32 (ARM Cortex M3) might be, it makes something of a meal of atomic access to individual bits in memory. The technique used is called bit-banding. Although it is simple enough in concept and pretty friendly to the assembly language programmer, it is easy enough to get lost in C. Or should that be at C?

So why would you want atomic access to individual bits? Consider a single bit used as a flag in your program. Perhaps you have a data buffer and the interrupt service routine that looks after it sets a bit in a memory location to signal that it is full. A higher level function sees the bit set and does something about it, then resets the bit. the problem is that this is just one bit in a whole memory word of 32 bits and, to modify it you need to read the word, change the bit and write back the word. What happens if some other interrupt function changes one of the other bits in that word after you read it but before you wrote back the modified version? when you write back your version, it will put all the bits back to how they were before you started thus destroying a piece of information.

You can generally solve this problem by using an entire variable for each such flag. At best this will use up a whole byte for each flag and so wastes memory. That may not be a problem for you and if that is the case then that will work out just fine. Things are not so easy if you wanted to keep all these flags neatly together in a single location as a status word that might get sent to a host or recorded in a log.

This kind of thing happens all the time in the peripheral control registers. Take the USART for example. The CTS flag goes high and an interrupt handler as part of its job, wants to reset the flag. Don’t ask me why, it just does. Meantime, you just received a byte and the RXNE flag is set to indicate that there is data waiting. But the CTS handler is in the middle of a read-modify-write cycle. It has read the status register and is in the process of clearing the CTS flag. When it writes the result back, the RXNE flag will be cleared and the arrival of the character could go un-noticed.

OK, so I just invented all that. the point is, the peripheral registers may need care in setting and clearing bits and the safest way to do that in both cases is through bit-banding. Why? because the changes are atomic. That is, they happen to single bits in on cycle and cannot be interrupted. Thus, there will never be an occasion where you read a bit which can be modified by some other code before you get to write it back out.

Right, so, how is it done? 8051 programmers had a rich set of bit set/reset instruction that could do all this very neatly. then again, the 8051 had a tiny address space and a suitably simple architecture.

On the STM32, some magic is worked internally so that each bit in a pre-defined memory range can be addressed as another location in a kind of virtual address space somewhere else. So, for example, the 32 bit value stored at address 0x20000000 also appears as 32 sequential memory locations starting at 0x22000000. There are two regions of memory that have bit-band alias regions. First there is a 1Mbyte SRAM region from 0x20000000 – 0x20100000 where each bit is aliases by a byte address in the range 0x22000000 – 0x23FFFFFF. Then there is the peripheral space from 0x40000000 – 0x40100000 which is aliased in the same way to the range 0x42000000 – 0x43FFFFFF.

Using this scheme, a read or write to memory location 0x22000000 is the same as a read or write to the least significant bit of SRAM location 0x20000000. I have no intention of going through the whole thing.

If you want to find out more about this and many other dark STM32 secrets, read the excellent book by Joseph Yiu – The Definitive Guide to the Arm Cortex – M3

The Peripheral Library, among other sources, provides us C programmers with macros to do the address translation. They look like this for the SRAM memory space:

#define RAM_BASE 0x20000000
#define RAM_BB_BASE 0x22000000
#define Var_ResetBit_BB(VarAddr, BitNumber)
(*(vu32 *) (RAM_BB_BASE | ((VarAddr - RAM_BASE) << 5) | ((BitNumber) << 2)) = 0)
#define Var_SetBit_BB(VarAddr, BitNumber)
(*(vu32 *) (RAM_BB_BASE | ((VarAddr - RAM_BASE) << 5) | ((BitNumber) << 2)) = 1)
#define Var_GetBit_BB(VarAddr, BitNumber)
(*(vu32 *) (RAM_BB_BASE | ((VarAddr - RAM_BASE) << 5) | ((BitNumber) << 2)))

These are all well and good but not too intuitive to use. Even if you understand what they do. Rather than mess with these, I define a couple of additional versions that look like this:

#define varSetBit(var,bit) (Var_SetBit_BB((u32)&var,bit))
#define varGetBit(var,bit) (Var_GetBit_BB((u32)&var,bit))
#define varGetBit(var,bit) (Var_GetBit_BB((u32)&var,bit))

Using these macros is quite simple. The following are all legitimate ways to use them:

uint32_t flags;
uint32_t status;
ready = varGetBit(flags,READY_BIT);

It is interesting to note that the varGetBit macro is an LValue so that it can be used in an assignment like this:

varGetBit(flags,4) = y;
varGetBit(flags,ARRIVED) = varGetBit(status,READY);

These methods are not primarily about speed but convenience. The compiler cannot know where the variables will be stored when the code is generated so you will see some of the calculations done at run time by your program. However, if you use the pointer method in the code fragment above, the calculation of the alias address is done only once and you will be able to get to the bit variables quite efficiently after that. To access bits in the peripheral registers, you use the exact same technique but with different base registers. There is, of course, no reason why you could not define a macro that refers to a specific bit in a peripheral register at a known address. Then you get the addresses completely pre-calculated by the compiler and the most efficient code since the peripheral register addresses are known at compile time.

14 thoughts on “Bit Banding in the STM32

  1. Harjit Singh

    Pete, the fact that you can use the varGetBit on the left side is really cool. Because of this, you can use the following:

    volatile unsigned int flags; // doesn't have to be volatile...
    #define FLAGS_B0 varGetBit(flags,0)
    #define FLAGS_B1 varGetBit(flags,1)

    void someFunction (void)
    FLAGS_B0 = 0;
    FLAGS_B1 = 1;
    if (1 == FLAGS_B0)
    FLAGS_B0 = 0;

    I would actually rename varGetBit to something more meaningful like AccessBit or some such name.

  2. peteh

    It was a bit of a lucky discovery that the syntax allows use as a lValue.

    That extra level of indirection with the macro makes for some nice, readable code which is still reasonably efficient.

    The name varGetBit was just a contraction of the original macro but you are right – looking at it in a listing gives little enough clue what it is for. I have not looked to compare the generated code for all three macros but the varGetBit() macro is good for all uses. your name suggestion would do nicely.


  3. Harjit Singh

    The code generated is much better than not using bit banding. The only disappointing thing is that it doesn’t seem to want to use register immediate offset mode where it has the base address in a register and then uses immediate offsets to get to the desired address. It is quite possible that the mode is actually slower or more code – I don’t know.

    Here is the code generated by GCC for the code fragment

    --- BitBandTest.c -- 40 ------------------------------------
    int main(void)
    unsigned int temp = 0;
    FLAGS_B0 = 0;
    4B2A ldr r3, [pc, #0xA8]
    2000 movs r0, #0
    015A lsls r2, r3, #5
    F0425108 orr r1, r2, #0x22000000
    F0410204 orr r2, r1, #4
    F04F0C01 mov.w r12, #1
    6008 str r0, [r1, #0]
    --- BitBandTest.c -- 45 ------------------------------------
    FLAGS_B1 = 1;
    F8C2C000 str.w r12, [r2, #0]
    --- BitBandTest.c -- 47 ------------------------------------
    if (0 != FLAGS_B0)
    680A ldr r2, [r1, #0]
    B102 cbz r2 0x080002FA
    --- BitBandTest.c -- 49 ------------------------------------
    FLAGS_B0 = 0;
    6008 str r0, [r1, #0]
    --- BitBandTest.c -- 50 ------------------------------------
    015B lsls r3, r3, #5
    F0435108 orr r1, r3, #0x22000000
    2201 movs r2, #1
    --- BitBandTest.c -- 52 ------------------------------------
    F0435C08 orr r12, r3, #0x22000000
    --- BitBandTest.c -- 53 ------------------------------------
    F0410104 orr r1, r1, #4
    600A str r2, [r1, #0]
    --- BitBandTest.c -- 52 ------------------------------------
    2000 movs r0, #0
    F04C010C orr r1, r12, #12
    --- BitBandTest.c -- 54 ------------------------------------
    F8CC2000 str.w r2, [r12, #0]
    --- BitBandTest.c -- 53 ------------------------------------
    6008 str r0, [r1, #0]
    --- BitBandTest.c -- 54 ------------------------------------
    status = varGetBit(flags,0);
    F8DC1000 ldr.w r1, [r12, #0]
    F8DFC06C ldr.w r12, [pc, #+0x6C]
    F8CC1000 str.w r1, [r12, #0]
    --- BitBandTest.c -- 55 ------------------------------------
    varGetBit(flags,4) = 1;
    F0435108 orr r1, r3, #0x22000000
    F0410110 orr r1, r1, #16
    --- BitBandTest.c -- 57 ------------------------------------
    varGetBit(flags,2) = varGetBit(status,0);
    EA4F1C4C mov.w r12, r12, lsl #5
    --- BitBandTest.c -- 58 ------------------------------------
    varGetBit(flags,4) = 1;
    600A str r2, [r1, #0]

    Interestingly, ARM has the essentially similar macros on their site but the generated code they show is what I would have liked but alas GCC can’t get to what the ARM tools can do. Here is the link to the ARM description and code:

    Compared to the ARM representation, I prefer your style of macros because they are generic i.e. can be used on any variable and bit.

  4. peteh

    I saw their stuff. Pretty well everyone who writes anything about bit-banding uses the same code, more or less. Partly because they go back to ARM to see how it should be done and partly because there are only so many ways to skin this cat.

    The reason they show better generated object code is that they are cheating by defining the absolute address of the variable in RAM as a macro. The compiler thus knows exactly what the pointer value should be.

    I expect that, if a generic variable were used, the code would look just like the GCC result. The reason being that the address of the variable is not known at compile time, only after the linker has done its stuff. Thus, the compiler is forced to calculate the pointer value.

    As far as I remember, using analogous macros for peripherals generates code as good as the ARM example because those addresses are fixed.

  5. Harjit Singh

    Would it help if a section of SRAM was set aside for bit band variables?

  6. peteh

    Not that I want to go dabbling in the linker scripts but it should be possible to reserve, say the first few bytes or Ram. these will alway be at a fixed address.

  7. Harjit Singh

    I was going to start using this yesterday but discovered that the STM32 definitions call out a bit mask instead of a bit number, so to use bit banding, I’ll have to create new #defines.

    I’m wondering if there is a macro that will convert masks to bit numbers, although I haven’t found one.

  8. peteh

    I do not have any real understanding of the pre-processor and cannot see how it would be possible to do the required calculation. All the methods I can think of to turn bitmask into bit position require work by the processor at run-time. Some of the best ways are here:

    Look at the counting trailing zeros section. There does not seem to be a way to do this at compile time.

    If there are not too many occasions when you need to use bit banding, it might be better to go with special-purpose inline functions.

    The whole process is only really needed for atomic access to the bits. Do you really need that?


  9. Petteri Aimonen

    I’m using this for bitbanding. It works with the default header file and compiles to the minimum four instructions (load zero/one, load memory address (high & low), store to memory) on GCC 4.5.1:

    #define BITBAND_ACCESS(variable, bitnumber)
    *(uint32_t*)(((uint32_t)&variable & 0xF0000000)
    + 0x2000000
    + (((uint32_t)&variable & 0x000FFFFF) < < 5) + (bitnumber << 2)) #define set_bit(variable, bitmask) BITBAND_ACCESS(variable, __builtin_ctz(bitmask)) = 1 #define clear_bit(variable, bitmask) BITBAND_ACCESS(variable, __builtin_ctz(bitmask)) = 0 /* Usage example */ set_bit(USART1->CR1, USART_CR1_TXEIE);

  10. peteh

    Thanks for the contribution. Your usage example will compile to small object code because the USART registers have predefined addresses.

    What will it compile to if you use a normal variable?

  11. Petteri Aimonen

    Yeah I guess it won’t work well with generic variables. I have no idea how that could be implemented without putting the variable at a fixed address.

  12. Abr

    While being a very useful feature, be aware that the Bit Banded write performs a read-modify-write access under the hood so that it is atomic w.r.t just the processor. There are side effects if you use it on registers that have some other bits which are, for instance rc_w0, rc_w1

  13. Pingback: Improved Bit Banding

Leave a Reply