Adding DMA to the SPI driver with the STM32F4

By | March 11, 2012

Sending data over SPI with the STM32 using polling is simple and reliable but your processor is blocked, unable to do anything else until the transfer is complete. Direct Memory Access (DMA) allows you to initiate a transfer of a block of data and then carry on doing something else while that completes. At the end of the transfer, an interrupt can be generated to allow your code to tidy up.

On my micromouse, I will have an HCMS3907 smart alphanumeric display. This particular device has four characters, each made up of a 7×5 array of LEDs. Data is shifted in to the display over SPI. When the Chip Select line is released at the end of a transfer, whatever is in the shift register gets displayed on the LEDs. The top 7 bits of the 20 byte buffer light up individual LEDs on the display. In this way, any pattern can be displayed. Generally, a font lookup table is used to create patterns for acceptable-looking alphanumerics. Non alphanumeric characters can be used for whatever you like so I have various shapes, blocks, lines and symbols like batteries in various stated of charge. Displays with more than four characters are available and several displays can be cascaded, all controlled by the same signals. All in all, these are very very versatile devices. Avago have a page full of variants with various sizes and voltage requirements [AVAGO Smart Alphanumeric Displays]

HCMS3907 uses SPI with DMA on an STM32F4 processor

Even with only four characters though, 20 bytes of data have to be transferred to fill the display. With a SPI clock of 2MHz, that would take about 80us without any overheads. If the HCMS3907 was the only SPI device on the board, I might be happy enough to just write to it whenever data needed sending but I also have the MAX6966 LED driver sharing the same SPI port. If either of these is looked after in an interrupt, they both must be or I would have to write some kind of arbitration code to make sure that they could never interfere with each other.

The easiest way to look after both of these is to have a couple of service routines that are called from my systick handler. The first routine updates a single LED on the MAX6966 chip which takes only about 8us. The second routine looks to see if anything needs sending to the HCMS3907 and, if it does, starts a DMA request. With systick firing off every millisecond, I don’t want to be having occasional 100us extra tasks at what might turn out to be a critical time. With DMA, the transfer gets started and the systick handler goes on about its business. At the end of the transfer, the DMA complete interrupt handler waits to makes sure all the data has been sent and then releases the Slave Select lines.

All the SPI setup code from the previous post (here) is still used but the DMA also needs to be set up. The first task is to tell the DMA controller what we want it to do:

DMA_InitTypeDef DMA_InitStructure;
  // Set up the DMA
  // first enable the clock
  // start with a blank DMA configuration just to be sure
  // Configure DMA controller to manage TX DMA requests
  // first make sure we are using the default values
  // these are the only parameters that change from the defaults
  DMA_InitStructure.DMA_PeripheralBaseAddr = (uint32_t) & (SPI_PORT->DR);
  DMA_InitStructure.DMA_Channel = SPI_PORT_TX_DMA_CHANNEL;
  DMA_InitStructure.DMA_DIR = DMA_DIR_MemoryToPeripheral;
  DMA_InitStructure.DMA_MemoryInc = DMA_MemoryInc_Enable;
   * It is not possible to call DMA_Init without values for the source
   * address and non-zero size even though a transfer is not done here.
   * These are checked only when the assert macro are active though.
  DMA_InitStructure.DMA_Memory0BaseAddr = 0;
  DMA_InitStructure.DMA_BufferSize = 1;
  DMA_Init(SPI_PORT_TX_DMA_STREAM, &DMA_InitStructure);
  // Enable the DMA transfer complete interrupt

The display is a write-only device so only the transmit service will be used and we want to send data from memory to the peripheral. The source memory address is auto incremented to allow a whole buffer to be sent. Unless a release configuration is used, the Peripheral Library uses assert macros to make sure that the initialisation has reasonably sensible looking values. Annoyingly, that includes the size of the buffer. Even though no data is to be transferred at this point, a non-zero buffer size must be given before DMA_Init() will run without an error. No matter, the actual transfers always set the source address and buffer size before the DMA is turned loose. SPI_PORT is defined in the hardware list for my board. In this case, it happens to be SPI2. Each peripheral has a different stream so that too is defined in the header file. There is a table in the reference manual (Table 20 in section 8.3.3) which shows how these are allocated. Here is a fragment from the header file showing the configuration for my board:

/* Definition for DMAx resources */
#define SPI_PORT_DR_ADDRESS                SPI_PORT->DR

#define SPI_PORT_DMA                       DMA1
#define SPI_PORT_DMAx_CLK                  RCC_AHB1Periph_DMA1

#define SPI_PORT_TX_DMA_CHANNEL            DMA_Channel_0
#define SPI_PORT_TX_DMA_STREAM             DMA1_Stream4

#define SPI_PORT_DMA_TX_IRQn               DMA1_Stream4_IRQn
#define SPI_PORT_DMA_TX_IRQHandler         DMA1_Stream4_IRQHandler

While this is nearly enough to get the DMA running, my SPI devices are using software configured Slave Select lines and these need to be released at the and of a transfer. Each DMA stream can be configured to generate an interrupt at the end of the transfer. The last line in the code above enables that interrupt so now we need to tell the NVIC controller about it and permit transfers to take place:

  NVIC_InitTypeDef NVIC_InitStructure;
  // enable the interrupt in the NVIC
  NVIC_InitStructure.NVIC_IRQChannel = SPI_PORT_DMA_TX_IRQn;
  NVIC_InitStructure.NVIC_IRQChannelPreemptionPriority = 0;
  NVIC_InitStructure.NVIC_IRQChannelSubPriority = 1;
  NVIC_InitStructure.NVIC_IRQChannelCmd = ENABLE;
  // Enable dma tx request.

With interrupts firing off, an interrupt handler is required. As previously, the actual interrupt vector is configured as a macro in the hardware configuration file so the handler code looks like this:

void SPI_PORT_DMA_TX_IRQHandler() {
  // Test if DMA Stream Transfer Complete interrupt
     * There is an unpleasant wait until we are certain the data has been sent.
     * The need for this has been verified by oscilloscope. The shift register
     * at this point may still be clocking out data and it is not safe to
     * release the chip select line until it has finished. It only costs half
     * a microsecond so better safe than sorry. Is it...
     *  a) flushed from the transmit buffer
    while (SPI_I2S_GetFlagStatus(SPI_PORT, SPI_I2S_FLAG_TXE) == RESET) {
     * b) flushed out of the shift register
    while (SPI_I2S_GetFlagStatus(SPI_PORT, SPI_I2S_FLAG_BSY) == SET) {
     * the DMA stream is disabled in hardware at the end of the transfer
     * deselect both the displays here rather than try and work out
     * which one last got done

There are a couple of points to note in the comments. First, I would be happier if I didn’t have to know which peripherals need their select lines releasing at the end of the interrupt. The need to know stops this from being a more general purpose driver. However, the pain of writing a ‘proper’ DMA controller for this purpose seemed not to be worth it in this case. Second, the need to wait for the data to leave the transmit shift register is a bit annoying. What is being tested here is the DMA completion, not the SPI completion. Again, it might be possible to do this better but it works well enough for now.

Sending data using the DMA is now very easy. All we need do is tell the DMA controller the source address and the buffer length and start it off. All the other information it needs is already configured and remains the same throughout. That would, of course not be the case if more than one peripheral needed the same stream. Here is the code, called by the systick handler every millisecond

 * The number of bytes in the DMA transfer buffer is tested and, if
 * non-zero, a DMA transfer of that number of bytes is initiated.
 * On completion, an interrupt is triggered which will wait until the
 * last byte goes from the SPI hardware and then deselect the display.
 * Once sent, the display latches all the data and need not be updated again
 * unless there is a change.
void HCMS3907_update(void) {
  if (HCMS3907_refresh_count) {
    HCMS3907_putBufferDMA(HCMS3907_SrcAddress, HCMS3907_refresh_count);
    HCMS3907_refresh_count = 0;

Elsewhere in the code, to print to the display, all that is required is to fill a buffer with however many bytes of data need to be sent and then set the buffer length appropriately. The next time the update function is called, it will send out the buffer if the count is non-zero. Otherwise, it does nothing. Here is the code for that:

 * This transfer uses the DMA peripheral. Again, only data is assumed to be sent
 * Once the DMA peripheral registers are set up, the stream is enabled and everything
 * will get sent automatically.
 * At the end of the transfer, an interrupt is generated where the transfer can be
 * tidied up and the display deselected.
 * Note that both the source address and the length are 32 bit values so you can 
 * transfer plenty of data
void HCMS3907_putBufferDMA (uint8_t *buf, uint8_t len)
  assert_param (len < = HCMS3907_BUFFER_LENGTH);
  SPI_PORT_TX_DMA_STREAM->NDTR = (uint32_t) len;
  SPI_PORT_TX_DMA_STREAM->M0AR = (uint32_t) buf;


The interrupt service routine at the end of the transfer takes no more than about 5us, however long the buffer is, and constitutes almost no load. If it didn’t have to wait for the SPI to finish transmitting, it would be barely noticeable. At worst, the DMA should be able to transmit about 200bytes at 2MHz on every systick. With a slightly faster clock, this would also be a great way to store logging data to serial FRAM or EEPROM, provided the target could keep up.

The complete driver for the HCMS3907 is in the source code linked below.
(UPDATED October 2012 to correct an error)


If you want to connect a display like this yourself, here is an extract from the schematics I used:

HCMS3907 Display connections

Although SPI2-MISO is connected to the RS line, no data is transferred over that line. It is used only to assert the Register Select on the display. The code sample should illustrate how the connections work.

22 thoughts on “Adding DMA to the SPI driver with the STM32F4

  1. sadok

    can you help me to program a project that generate frame CAN’s with STM32F4

  2. Peter Harrison Post author

    I am afraid I know nothing about CAN

  3. mog123

    hi Peter,

    I am currently having trouble with working out how to connect the HCMS-3907 to the stm32f405rgt6 myself. I was wondering how’s the current consumption on the display, and a general way to connect it (could you show how You connected it?) because the datasheet is a little bit too general in all of this.

    Good luck in Taiwan!

  4. Peter Harrison Post author

    Sorry for the delay. I have not been well for a couple of weeks. For the connections, I added a drawing to the post. See above.

  5. mog123

    Hey Peter, thanks for the answer. I finally managed to do the connections on my own, although I just got my PCB’s and I’m a bit worried because I didn’t connect the reset line, will that prevent me from using the display?

  6. Peter Harrison Post author

    You can leave the reset line connected to the positive supply so long as you correctly initialise the chip. The datasheet explains how to initialise properly.

    You must not leave it entirely disconnected though as its state will be unknown.

  7. Peter Harrison Post author

    You can but, if you have not yet connected it to anything, connect it to the processor reset line.

  8. mog123

    Last question to be sure. The line isn’t connected anywhere yet. connecting it to NRST is going to be the best option? As I have an SWD interface really close to the display that would be the best. No pullup will be needed then?

    Cheers and thanks for the advice

  9. DerBimpf

    A note regarding the wait for the TCIF in the ISR: Don’t use the SPI DMA TX but the SPI DMA RX interrupt to indicate the end of the transmission (still using TX DMA to get the whole thing going). That way you can immediately change the CS line in the ISR.

  10. Peter Harrison Post author

    Thanks. That sounds like a good idea. Is that what you do?

  11. DerBimpf

    Yes, I wouldn’t have suggested it otherwise. You can keep most of your code. This is how it basically works:
    – Initialize SPI normally including SPI DMA Tx and Rx request.
    – Enable the correct DMA stream interrupt for SPI Rx in the NVIC.
    – Initialize SPI Tx and Rx DMA streams but don’t enable them.
    – The ISR handler could look like this (for SPI1 Rx):

      void DMA2_Stream0_IRQHandler(void) {
        if (DMA2->LISR & DMA_LISR_TCIF0) {
          DMA2_Stream0->CR &= ~DMA_SxCR_TCIE; // Disable TC int
          DMA2->LIFCR = DMA_LIFCR_CTCIF0; // Clear Rx TC pending bit
          DMA2->HIFCR = DMA_HIFCR_CTCIF5; // Clear Tx TC pending bit

    – And to transmit:

    void W5200TxBuf_Transmit(void) {
       DMA2_Stream0->CR |= DMA_SxCR_TCIE; // Enable rx complete int
       DMA2_Stream0->CR &= ~DMA_SxCR_EN; // Disable DMA2 Stream0
       DMA2_Stream0->NDTR = (uint16_t)(; // Load length
       DMA2_Stream0->CR |= DMA_SxCR_EN; // Re-enable DMA
       DMA2_Stream5->CR &= ~DMA_SxCR_EN; // Disable DMA2 Stream5
       DMA2_Stream5->NDTR = (uint16_t)(; // Load length
       DMA2_Stream5->CR |= DMA_SxCR_EN; // Re-enable DMA

    That’s it.

  12. jv

    I’m trying to follow your example but some files seem to be missing in your zip files, mainly discovery-hardware.h with all the important definitions.
    This file is contained in the previous SPI example but there’s important changes that i haven’t been able to grasp.
    I know this is an old thread but maybe you could add the remaining files?
    thanks in advance!

  13. Peter Harrison Post author

    Can you say what the errors are?

    I don’t know what files may be missing or changed now. You will need the older ST M32F4 Standard Peripheral Libraries which are not mine to distribute. I used V1.0.0.

    Also, since I was using Rowley Crossworks, there is reference to the cross_studio_io.h file which is proprietary to Rowley. If you don’t have that, you will have to remove debug_printf() and debug_exit() function calls.

  14. jv

    i managed to have your previous SPI example working but i get a bit lost when adding the DMA configurations, errors are too many to report, i believe most of them related to the missing .h file,
    so it’s not a library issue or debug statements.
    I guess i’ll just keep trying to understand DMA better…

  15. Peter Harrison Post author

    I just checked. There are differences.

    This one may be more correct:

    * File: discovery-hardware.h
    * Author: peter harrison
    * Created on 01 March 2012, 07:14
    * Adds macros for the LEDs on the STM32F4Discovery board
    * These are very wordy here but serve to remind that the
    * LEDs might well be on different ports and pins. This way, it will
    * always work out if they are properly described here.
    * Expanded to include definitions for the HCMS3907 display and
    * the SPI port it is connected to

    #ifndef DISCOVERY_H
    #define DISCOVERY_H

    #ifdef __cplusplus
    extern "C" {
    #include "stm32f4xx.h"

    * These are the LEDs on the Discovery board

    #define GREEN_LED_PIN GPIO_Pin_12

    #define ORANGE_LED_PIN GPIO_Pin_13

    #define RED_LED_PIN GPIO_Pin_14
    #define RED_LED_PORT GPIOD

    #define BLUE_LED_PIN GPIO_Pin_15


    #define STATUS_PIN GPIO_Pin_2


    #define GREEN_ON() GPIO_SetBits(GREEN_LED);
    #define GREEN_OFF() GPIO_ResetBits(GREEN_LED);
    #define GREEN_TOGGLE() GPIO_ToggleBits(GREEN_LED);

    #define ORANGE_ON() GPIO_SetBits(ORANGE_LED);
    #define ORANGE_OFF() GPIO_ResetBits(ORANGE_LED);
    #define ORANGE_TOGGLE() GPIO_ToggleBits(ORANGE_LED);

    #define RED_ON() GPIO_SetBits(RED_LED);
    #define RED_OFF() GPIO_ResetBits(RED_LED);
    #define RED_TOGGLE() GPIO_ToggleBits(RED_LED);

    #define BLUE_ON() GPIO_SetBits(BLUE_LED);
    #define BLUE_OFF() GPIO_ResetBits(BLUE_LED);
    #define BLUE_TOGGLE() GPIO_ToggleBits(BLUE_LED);

    #define STATUS_ON() GPIO_SetBits(STATUS);
    #define STATUS_OFF() GPIO_ResetBits(STATUS);
    #define STATUS_TOGGLE() GPIO_ToggleBits(STATUS);

    * Using a single SPI port for the output devices
    #define SPI SPI2
    #define SPI_PORT SPI2
    #define SPI_PORT_CLOCK RCC_APB1Periph_SPI2
    #define SPI_PORT_CLOCK_INIT RCC_APB1PeriphClockCmd

    #define SPI_SCK_PIN GPIO_Pin_13
    #define SPI_SCK_SOURCE GPIO_PinSource13
    #define SPI_SCK_AF GPIO_AF_SPI2

    #define SPI_MOSI_PIN GPIO_Pin_15
    #define SPI_MOSI_SOURCE GPIO_PinSource15
    #define SPI_MOSI_AF GPIO_AF_SPI2

    /* Definition for DMAx resources **********************************************/

    #define SPI_PORT_DMA DMA1
    #define SPI_PORT_DMAx_CLK RCC_AHB1Periph_DMA1

    #define SPI_PORT_TX_DMA_CHANNEL DMA_Channel_0
    #define SPI_PORT_TX_DMA_STREAM DMA1_Stream4

    #define SPI_PORT_DMA_TX_IRQn DMA1_Stream4_IRQn
    #define SPI_PORT_DMA_TX_IRQHandler DMA1_Stream4_IRQHandler
    * The HCMS3907 SPI dot matrix display
    #define HCMS3907_RS_PIN GPIO_Pin_2
    #define HCMS3907_RS_GPIO_PORT GPIOD
    #define HCMS3907_RS_GPIO_CLK RCC_AHB1Periph_GPIOD
    #define HCMS3907_RS HCMS3907_RS_GPIO_PORT, HCMS3907_RS_PIN

    #define HCMS3907_CS_PIN GPIO_Pin_12
    #define HCMS3907_CS_GPIO_PORT GPIOB
    #define HCMS3907_CS_GPIO_CLK RCC_AHB1Periph_GPIOB
    #define HCMS3907_CS HCMS3907_CS_GPIO_PORT, HCMS3907_CS_PIN

    #define HCMS3907_SELECT() GPIO_ResetBits(HCMS3907_CS_GPIO_PORT, HCMS3907_CS_PIN)
    #define HCMS3907_DESELECT() GPIO_SetBits(HCMS3907_CS_GPIO_PORT, HCMS3907_CS_PIN)
    #define HCMS3907_IS_SELECTED() GPIO_ReadOutputDataBit(HCMS3907_CS_GPIO_PORT, HCMS3907_CS_PIN)
    #define HCMS3907_DATA() GPIO_ResetBits(HCMS3907_RS_GPIO_PORT, HCMS3907_RS_PIN)
    #define HCMS3907_COMMAND() GPIO_SetBits(HCMS3907_RS_GPIO_PORT, HCMS3907_RS_PIN)

    #ifdef __cplusplus

    #endif /* DISCOVERY_H */

  16. kisielk

    Just a note for anyone trying this on other STM32 chips… on the STM32F373 the data length and address registers are read-only when the DMA channel is enabled, so you have to disable the DMA channel before attempting to write to them in your equivalent of the HCMS3907_putBufferDMA function. This probably applies to some other non-F4 chips as well.

  17. Peter Harrison Post author

    Thank you. And can I just say, I am a huge fan of Taiko Drumming.

Leave a Reply