Last time, the STM32 was set up to use the Internal RC oscillator, HSI. This runs at 8MHz. The PLL multiplier was told to use the HSI/2 as its input and the multiplier value was set to x 5. The result being a 20MHz system clock. Now, what happens if the PLL multiplier is increased to make the system run faster…

It breaks is the short answer.

In the manual, we find that the flash memory is not fast enough to keep up with the processor clock if the clock is too high. Wait states can be added to flash memory accesses to allow the pre-fetch buffer to keep up. The buffer is turned on by default after a reset so you don’t need to do anything special there. The guidance is:

  • zero wait state, if 0 < SYSCLK < 24 MHz
  • one wait state, if 24 MHz < SYSCLK < 48 MHz
  • two wait states, if 48 MHz < SYSCLK < 72 MHz

With SYSCLK at 20MHz, everything works just fine. The following code fragment:

  volatile int i;
  while(1){
    GPIOA_BRR = (1<<5);
    for(i=0;i<1000;i++);
    GPIOA_BSRR = (1<<5);
    for(i=0;i<1000;i++);
  }

will generate pulses on the output pin at 950Hz. Remember to make sure all optimisations are off or the for loops may get optimised out. Declaring i as a volatile int forces the compiler not to optimise the loops away in case something else it does not know about changes the value unexpectedly.

Changing the PLL multiplier from x5 to x10 should put SYSCLK at 40MHz where we are supposed to need one wait state. In fact, the program runs just fine with the output toggling at 1900Hz – exactly twice what it was before just as you might expect. Like any overclocking, I guess it is not sensible to operate the device outside its specification if you want it to carry on working, or for the same code to work across a range of devices.

I tried increasing the multiplier, one step at a time until the program failed to work. It was just fine with the multiplier all the way up to 14 which would set the system clock to 56MHz. The output here as 2600Hz so everything was scaling perfectly. After that point it simply would not run. The debugger indicated that as soon as the PLL was turned on, the processor just lost its grip on reality and spun off into the aether. Or something like that.

OK, so what happens when we add wait states? This is very easily done with a single line of code. the following fragment will add two wait states. Substitute the 2 for a 1 or a zero as needed.

  FLASH_ACR = FLASH_ACR_PRFTBE | 2<<FLASH_ACR_LATENCY_BIT; // 2 wait states

With the PLL put back to a variety of values to set SYSCLK at frequencies between 20 and 52 MHZ, the same LED flashing code was run with zero, one and two wait states and the frequency at the LED pin measured.

SYSCLK 0 WS 1 WS 2 WS
16MHz 760 665 569
28MHz 1330 1160 997
40MHz 1900 1660 1424
52MHz 2468 2160 1850
64MHz —- 2660 2280

The strikethrough indicates results that are outside the parameters given in the ST manual. The thing to notice that using 2 wait states in flash will slow your program to about 75% of its speed without any wait states. That means that, if you were to increase your system clock by a small amount and take it over one of the thresholds, you may find that your code runs slower than it did before if you choose to insert the appropriate number of wait states.

Code on the ARM can run from RAM which is supposed to be faster than running from flash. There must be some additional trick to this because, when I tried the exact same code compiler to run out of flash, it performed at almost exactly the same speed as with 2 wait states in flash.

The High Speed External oscillator (HSE) is generally connected to a crystal or a resonator. This is good when ever you want to have a system clock or other timebase derived from it that cannot readily be made from the internal 8MHz oscillator (HSI). This may be needed if you are dealing with ethernet or USB or TV signals or something like that or if you need a particular stability or synchronisation with another clock source. If you don’t need an odd frequency, you might just as well stick with the HSI clock. On my test board, the internal clock is just as accurate as using an external crystal at room temperature. I don’t know how temperature dependent it is but then I don’t really care for my toys. Turning on the HSE clock is relatively simple:

  RCC_CR |= RCC_CR_HSEON;                   // turn on the HSE
  while ((RCC_CR & RCC_CR_HSERDY)==0);      // wait for it to become stable
  RCC_CFGR |= (1 << RCC_CFGR_PLLXTPRE_BIT); // pre-scale the HSE by 2
  RCC_CFGR |= (1 << RCC_CFGR_PLLSRC_BIT);   // use the HSE as input to the PLL

There is no need to pre scale the HSE by two before feeding it to the PLL. I put in the line to show how and so that the frequency presented to the PLL is the same on my board for both the HSE and HSI inputs since my HSE is connected to an 8MHz crystal.

Again, if you don’t need a source clock frequency different to 8MHz, there seems little need to use HSE at all. The HSI input can be used to make SYSCLK as high as 64MHz. Using the HSE extends that to 72MHz – not enough of an increase to make up for having to use two wait states at those frequencies. If your system isn’t fast enough by then, you may be using the wrong processor or algorithm.

Note that this code is probably not a good example for testing the performance as I have no idea how it interacts with the operation of the pre-fetch buffer. I would expect that performance issues to buffer wait-states would not get much worse than this.

Now I can set the main clock up, drive output pins and read input pins.

Next time, a little bit about Crossworks, JTAG, startup files and running without the debugger.

 

This Post Has 4 Comments

  1. Harjit

    Peter, it is exciting that you are making progress on the STM32. I’m going to be doing the samething over the next couple of weeks also.

    I was going to say that on the Cortex-M3 there is a dedicated bus for the flash but not the RAM, so running code from RAM with DMA going may be slower than from flash.

  2. peteh

    Yes, I want all the RAM I can get for data logging although I am tempted to add a microSD facility as well if it can be written to fast enough. No need for a filing system – use the processor to read it all back out after.

  3. patil rahul

    hello,
    this stuff of crossworks is really interesting but please can u suggest me any more resource for getting familiar with the registers of stm32f4xx.actually i m a beginner with arm MCU and i need a proper direction as stm32 is my 1st target.i have worked on 8-bit and 16 bit before..

  4. Peter Harrison

    I don’t really know of anywhere with much in the way of tutorial content – especially for the STM32F4 devices. However, for what it is worth, I am starting to write up some of my current stuff about the ‘F4.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.