An Improved Bit Banding Approach

The commonly published approach to using the bit banding feature of the Cortex Mx family of processors is to use macros – see Bit Banding in the STM32. This post describes an alternate implementation that uses a dedicated RAM section for bit banding.

typedef unsigned int u32;
typedef volatile unsigned int vu32;

When working on the diagonal solver, it became apparent that to make the solver run quickly, and to save RAM, on the Cortex M3 I needed to use bit banding with boolean variables.

The way bit banding works is that there are two address ranges to access the same memory location. In one address range, you access it “32 bits at a time”. And through the second address range, the bit banding “alias”, you access one the same memory location but one bit at a time and with a 32 bit stride. When I say 32 bit stride I mean that the address for sequential bits are separated by 32 bits.

This picture from the Cortex Mx manual graphically shows what is going on.
BitBandMapping

Typical C compilers and linkers only know how to access memory from the “32 bits at a time” address range. So, what we have to do is tell the C compiler about the second way to access the memory i.e. through the bit band alias.

A C compiler calls various portions of a program – code, uninitialized data, initialized data – sections. The linker is invoked with a specification file which it uses to place these sections at the specified locations. To get more details on this, do a search on GCC, linker and section.

I use “CrossWorks for ARM” and they do the section work using two files. The first file is processor specific and called “<processor_name>_MemoryMap.xml” and its contains a  list of memory segments and register descriptions for that processor. The second file is called a placement file. The placement file is generic to all parts and maps the C compiler section names to the memory segments called out in the processor specific files. For a compiler vendor, this is a nice way to split things because it simplifies support for a large number of devices.

So, to create the bit band alias, in the MemoryMap file, we reduce the RAM 32 bit access range by the amount we are going to use for the bit band alias.

The original RAM statement was:

<MemorySegment size="0x10000" name="RAM" start="0x20000000"/>

After adding the bit band alias, it looks like this:

<MemorySegment size="0x00100" name="BBRAM" start="0x20000000" access="Read/Write"/> 
<MemorySegment size="0x02000" name="BBALIAS" start="0x22000000" access="Read/Write"/>
<MemorySegment size="0x0ff00" name="RAM" start="0x20000100"/>

This creates a 256 byte segment at the start of physical RAM for the bit band alias. The 256 byte segment results in 256 * 32 = 8,192 size segment in the BBALIAS segment because each bit in the BBRAM segment maps to 32 bits in the BBALIAS segment.

NOTE: One thing to keep in mind with putting the bit band alias at the start of RAM is that if you relocate the vector table to RAM, the vector table address has very specific alignment requirements that must be met – don’t ask how I know.

Next, we add the following statements to the placement file:

<MemorySegment name="BBRAM">     <ProgramSection name="bbram"/>   </MemorySegment>   
<MemorySegment name="BBALIAS">     <ProgramSection name="bbalias"/>   </MemorySegment>

To place a variable in the bit band alias section, declare it as follows:

volatile bool NewBBFlagBit1 __attribute__ ((section ("bbalias")));

By using the attribute command, we tell the C compiler to place the variable NewBBFlagBit1 in the section called bbalias. The linker then resolves the bbalias to the BBALIAS address range.

NOTE: This scheme does have one limitation. The GCC compiler system requires that any variable that is placed in a section using the attribute schema be a global variable. If you need to use a variable within a routine, you can make it a static variable. The example below shows how to do this.

NOTE: Since this is a bit band variable, you can access it either through the BBALIAS address range or through the BBRAM address range.

Now, let’s look at a simple example where we have a global bit flag, a local bit flag and look at the resultant code.

Here is the macro approach sample code:

typedef unsigned int u32;
typedef volatile unsigned int vu32;

#define RAM_BASE       0x20000000
#define RAM_BB_BASE    0x22000000
#define Var_ResetBit_BB(VarAddr, BitNumber)  (*(vu32 *) (RAM_BB_BASE | ((VarAddr - RAM_BASE) << 5) | ((BitNumber) << 2)) = 0)
#define Var_SetBit_BB(VarAddr, BitNumber)    (*(vu32 *) (RAM_BB_BASE | ((VarAddr - RAM_BASE) << 5) | ((BitNumber) << 2)) = 1)
#define Var_GetBit_BB(VarAddr, BitNumber)    (*(vu32 *) (RAM_BB_BASE | ((VarAddr - RAM_BASE) << 5) | ((BitNumber) << 2)))

#define varSetBit(var,bit) (Var_SetBit_BB((u32)&var,bit)) 
#define varGetBit(var,bit) (Var_GetBit_BB((u32)&var,bit))
#define varResetBit(var,bit) (Var_ResetBit_BB((u32)&var,bit)) 

vu32 OldBBFlags;

void vOldBitBand(void)
{
    static vu32 OldBBFlagsLocal;

    varResetBit(OldBBFlags,0);

    OldBBFlags = 0x01;

    if (varGetBit(OldBBFlags,0))
      varSetBit(OldBBFlagsLocal,0);
    else
      varResetBit(OldBBFlagsLocal,0);

    OldBBFlags = 0x02;

    return;
}

Here is the section approach sample code:

typedef enum {FALSE = 0, TRUE = !FALSE} bool;
typedef volatile unsigned int vu32;

volatile bool NewBBFlagBit1 __attribute__ ((section ("bbalias")));

vu32 NewBBFlags __attribute__ ((section ("bbram")));

void vNewBitBand(void)
{
    static volatile bool NewBBFlagBit2 __attribute__ ((section ("bbalias")));

    NewBBFlagBit1 = 0;

    NewBBFlags = 0x01;

    NewBBFlagBit2 = NewBBFlagBit1;

    NewBBFlags = 0x02;

    return;
}

When the macro code is compiled using GCC with all optimization level 3, we get:

void vOldBitBand(void)
{
static vu32 OldBBFlagsLocal;
    4B0D        ldr r3, 0x08000330 <vOldBitBand+0x38>
    2000        movs r0, #0
    0159        lsls r1, r3, #5
    F0415208    orr r2, r1, #0x22000000
    2101        movs r1, #1

varResetBit(OldBBFlags,0);
    6010        str r0, [r2]

OldBBFlags = 0x01;
    6019        str r1, [r3]

if (varGetBit(OldBBFlags,0))
    6812        ldr r2, [r2]
    B93A        cbnz r2, 0x0800031C <vOldBitBand+0x24>

varSetBit(OldBBFlagsLocal,0);
    4909        ldr r1, 0x08000334 <vOldBitBand+0x3C>
    0148        lsls r0, r1, #5
    F0405108    orr r1, r0, #0x22000000
    600A        str r2, [r1]

OldBBFlags = 0x02;
    2202        movs r2, #2
    601A        str r2, [r3]

return;
}
    4770        bx lr

0x0800031C:
else
varResetBit(OldBBFlagsLocal,0);
    4805        ldr r0, 0x08000334 <vOldBitBand+0x3C>
    0142        lsls r2, r0, #5
    F0425C08    orr r12, r2, #0x22000000

    2202        movs r2, #2
    F8CC1000    str.w r1, [r12, #0]

OldBBFlags = 0x02;
    601A        str r2, [r3]
return;
}
    4770        bx lr
    BF00        nop

0x08000330:
    0104        lsls r4, r0, #4
    2000        movs r0, #0

0x08000334:
    0100        lsls r0, r0, #4
    0000        movs r0, r0

When the section approach code is compiled using the same settings as the macro code above, we get:

--- BitBandNew.c -- 9 --------------------------------------
void vNewBitBand(void)
{
static volatile bool NewBBFlagBit2 __attribute__ ((section ("bbalias")));
    F2400000    movw r0, #0
    F2400200    movw r2, #0

NewBBFlagBit1 = 0;
    F2C22000    movt r0, #0x2200

NewBBFlags = 0x01;
    F2C20200    movt r2, #0x2000

    2100        movs r1, #0
    2301        movs r3, #1

NewBBFlagBit1 = 0;
    6001        str r1, [r0]

NewBBFlags = 0x01;
    6013        str r3, [r2]

NewBBFlagBit2 = NewBBFlagBit1;
    6801        ldr r1, [r0]
    2302        movs r3, #2

NewBBFlagBit2 = NewBBFlagBit1;
    6041        str r1, [r0, #4]

NewBBFlags = 0x02;
    6013        str r3, [r2]

return;
}
    4770        bx lr
    BF00        nop

The section approach code is 34 bytes VS 64 bytes for the macro based code. It also should run faster…

 

Incoming search terms:

  • stm32 (49)
This entry was posted in ARM, Micromouse, Software, STM32 and tagged , , , . Bookmark the permalink.

Leave a Reply