Micromouse book
Categories
Recent Comments
Meta
Popular Posts
- Simple ADC use on the STM32 (3,936)
- STM32 Arm-Cortex bootloader (2,658)
- STM32 USART basics (2,538)
- All Japan Micromouse 2011 – finals (1,970)
- STM32F4 – the first taste of speed (1,615)
- Micromouse Book (1,532)
- Nokia 3410 LCD on the STM32 (1,268)
- CodeSourcery GNU Toolchain for the ARM on a Mac (1,097)
- Bit Banding in the STM32 (974)
- ARM STM32 JTAG (921)
Blogroll
-
Upcoming Events
-
Feb6Mon
-
Apr14Sat
-
Category Archives: STM32
STLINK SWD for STM32
The standard JTAG connector for ARM processors is the huge 20 pin IDC header. It has a whole bunch of unused pins and takes up a lot of board space. There are a several alternatives that reduce the pin count … Continue reading
STM32F4 – the first taste of speed
The recently announced STM32F4 series of processors using the ARM Cortex M4 are very attractive. High speeds, large memory space and a floating point unit are among the obvious benefits although there are many other architectural changes in the ST … Continue reading
Worlds Most Powerful Cortex Microcontrollers
STMicroelectronics, today announced the introduction of its new STM32 F4 series of microcontrollers. This extension to the STM32 platform is based on the latest ARM Cortex-M4 core, which adds signal-processing capabilities and faster operations to the already outstanding portfolio of STM32 microcontrollers; the new series, which is available now, reinforces ST’s leadership and claims the title of highest-performance Cortex-M processor-based microcontroller range in the market.
The STM32 range is the industry’s most successful family of 32-bit ARM Cortex-M processor-based microcontrollers, with nearly one of every two units shipped being a member of the STM32 family. The single-cycle DSP instructions of the STM32 F4 open the doors to the digital signal controller (DSC) market that requires high computational capability and DSP instructions for demanding applications such as high-end motor control, medical equipment and security.
By providing a simple, full pin-to-pin and software compatible upgrade from the STM32 F2 series with more SRAM, higher performance and a robust collection of peripherals, the F4 series will enable customers designing with the STM32 F2 series microcontrollers to offer product extensions by upgrading to the F4 series if they need more memory, performance or features. In addition, customers currently using a 2-chip MCU and DSP approach can now combine both chips in one high-performance digital signal controller.
Beyond offering pin-to-pin and software compatible with the high-performance F2 series, the F4 series operates at a higher frequency (168 MHz instead of 120 MHz), offers single-cycle DSP instruction support and a Floating Point Unit, larger SRAM (192 Kbytes vs. 128 Kbytes), embedded flash memory from 512 Kbytes up to 1 Mbyte, and advanced peripherals for imaging, connectivity and encryption. ST’s 90nm CMOS process technology and the integrated ST Adaptive Real Time “ART Accelerator” deliver state-of-the-art performance, with zero-wait-state program execution up to 168 MHz, and best-in-class dynamic power.
“The STM32 F4 series is attractive for so many more reasons than simply because it is the highest performing Cortex M processor-based microcontroller available on the market today,” said Claude Dardanne, Executive Vice President and General Manager Microcontrollers, Memories and Secure MCUs Group, “With more than 250 compatible devices already in production, the industry’s best development ecosystem, and outstanding power consumption and overall functionality, the F4 family is the cherry at the top of the STM32 family of Cortex-M processor-based MCUs, which now includes four product series: the STM32 F1 series, the STM32 F2 series and the STM32 L1 series, all based on the Cortex-M3 processor, and the new F4 Series, based on the Cortex-M4 processor.”
Specific technical benefits of the F4 series include:
- Ultra-fast data transfers, with a 7-layer multi-AHB bus matrix and multi-DMA controllers, which allow concurrent execution and data transfers;
- The integrated single-precision FPU boosts the execution of control algorithms, adds more features to applications, improves code efficiency, reduces time-to-market, eliminates scaling and saturation, and allows the use of meta-language tools;
- High integration, with up to 1 Mbyte of on-chip Flash memory, 192 Kbytes of SRAM, reset circuit, internal RCs, PLLs, sub 1microAmp real-time clock with sub-second accuracy;
- Extra flexibility to reduce power consumption in applications requiring both high processing power and low-power performance when running at low voltage or on rechargeable batteries. These include 4 Kbytes of backup SRAM to save data in standby or battery backup modes, a typical RTC consumption of <1uA in Vbat mode, and an internal voltage regulator with power scaling capability, allowing the selection of performance or lower consumption;
- An outstanding tool and software ecosystem with a broad offering of Integrated Development Environments, Meta-language tools, a DSP library, inexpensive starter kits, software libraries and stacks;
Superior and innovative peripherals:
- Connectivity: Camera interface, Crypto/Hash HW processor, Ethernet MAC10/100 with IEEE 1588 v2 support, two USB OTG (one with HS support),
- Audio: dedicated audio PLL and two full duplex I2S;
- Up to 15 communication interfaces (including six USARTs running up to 10.5 Mbits/s, three SPI running up to 42 Mbits/s, three I2C, two CAN, SDIO);
- Analog: Two 12-bit DACs, three 12-bit ADCs reaching 2.4 MSPS or 7.2 MSPS in interleaved mode;
- Up to 17 timers: 16-bit and 32-bit running up to 168 MHz;
- The family is in production now.
The STM32 F4 Series is available in four variants:
STM32F405x: in addition to a complete set of advanced peripherals including timers, three ADCs, two DACs, serial interfaces, external memory interface, RTC, CRC calculation unit and analog true Random Number Generator, the STM32F405 products have a USB On-The-Go (OTG) full-speed/high-speed interface. They are available in four packages (WLCSP64, LQFP64, LQFP100, LQFP144) with 1 Mbyte of Flash.
STM32F407 products add several advanced peripherals to the ones offered on the STM32F405 products: a second USB OTG interface (full-speed only); an integrated Ethernet MAC 10/100 supporting both MII and RMII, with IEEE1588 Precise Time Protocol v2 Hardware support and an 8- to 14-bit parallel camera interface allowing the connection of a CMOS camera sensor, supporting up to 67.2 Mbytes/s. Devices are available in four packages (LQFP100, LQFP144, LQFP/BGA176), with from 512 Kbytes to 1 Mbyte of Flash.
The STM32F415 and STM32F417 parts add a crypto/hash processor to the STM32F405 and STM32F407. This crypto/hash processor includes hardware acceleration for AES 128, 192, 256, Triple DES, HASH (MD5, SHA-1). As an example of the performance achieved by the crypto/hash processor, the AES-256 encryption throughput reaches up to 149.33 Mbytes/s.
All variations are in volume production, with prices beginning from $5.74 for the STM32F407VET6 with 512 Kbytes of Flash and 192 Kbytes RAM in the LQFP100 package, for orders of more than 1,000 units.
* STM32 is a registered trademark of STMicroelectronics; ARM and Cortex are trademarks of ARM. All other trademarks are the property of their owners.
See more details of specific devices here: STM32 F4 series MCUs Continue reading
Posted in STM32
Leave a comment
STM32 internal oscillator
The internal oscillator on the STM32 processors can be tuned so that an external crystal or oscillator isn’t necessary.
As Peter pointed out in an earlier blog, the STM32′s internal RC oscillator can be used to eliminate a handful of external clock components (one R, two C’s and a crystal).
Since mice use the serial port for communications, if the RC oscillator’s frequency is off by more than one percent, at higher baud rates, you can expect to see communications errors. To mitigate this, you can use the HSITRIM field in the RCC_CR register to adjust the RC oscillator’s freq. in approx. 40KHz increments.
ST provided the adjustment capability because during processor test, they adjust the freq. of the RC oscillator to within 1% of 8MHz at 25C. Since the initial error is already at the limits of the communication error budget, I decided to use the HSITRIM feature to optimize the RC oscillator freq. to 8MHz.
In my setup, I wanted to run the serial port at 921K. When I calculated the baud rate divider based error with the RC oscillator at 8MHz, I would have an error of -0.8% which with a processor with an almost perfect clock was too close to the 1% max. error for reliable communications. So, I decided I would adjust the RC oscillator’s freq. to 0.8% higher to minimize the baud rate divider error.
To tune freq. is relatively straight forward – enable the clock output (MCO) on the processor and then use a bench counter/timer to measure the clock freq. Then adjust HSITRIM to get to your target value.
On my particular processor, with HSITRIM set to the default value (16), the processor freq. was 8.07MHz. As it turned out, at 8.07MHz, the baud rate error for 921K is 0.07%. So, I didn’t have to trim the RC oscillator!
In playing around, if I did want to set the RC oscillator to 8MHz, I would have had to set HSITRIM to 13. Also, I wanted to see how sensitive the RC oscillator was to temperature. So, I took a can of Freeze spray and sprayed the STM32 processor. The RC oscillator freq. went up. Since I wasn’t monitoring the temp., I can’t provide a temp. coef. but I felt that the temp. coef. was relatively low i.e. with the mouse being in a room environment, the RC osicllator freq. won’t go outside the 1% range necessary for robust async. serial communications.
A mistake I made was that I didn’t put a test point on the pin that MCO can be routed to, so probing the pin to make this measurement was challenging. In hindsight, I think another way to make this measurement would be to write a small program which sets the serial port to 8 bit, no parity, one stop bit and then to output 0x0f on the serial port (continuously). You can then monitor the transmit pin and determine the best trim setting for the oscillator.
Bit Banding in the STM32
Wondrous though the STM32 (ARM Cortex M3) might be, it makes something of a meal of atomic access to individual bits in memory. The technique used is called bit-banding. Although it is simple enough in concept and pretty friendly to the assembly language programmer, it is easy enough to get lost in C. Or should that be at C?
So why would you want atomic access to individual bits? Consider a single bit used as a flag in your program. Perhaps you have a data buffer and the interrupt service routine that looks after it sets a bit in a memory location to signal that it is full. A higher level function sees the bit set and does something about it, then resets the bit. the problem is that this is just one bit in a whole memory word of 32 bits and, to modify it you need to read the word, change the bit and write back the word. What happens if some other interrupt function changes one of the other bits in that word after you read it but before you wrote back the modified version? when you write back your version, it will put all the bits back to how they were before you started thus destroying a piece of information.
You can generally solve this problem by using an entire variable for each such flag. At best this will use up a whole byte for each flag and so wastes memory. That may not be a problem for you and if that is the case then that will work out just fine. Things are not so easy if you wanted to keep all these flags neatly together in a single location as a status word that might get sent to a host or recorded in a log.
This kind of thing happens all the time in the peripheral control registers. Take the USART for example. The CTS flag goes high and an interrupt handler as part of its job, wants to reset the flag. Don’t ask me why, it just does. Meantime, you just received a byte and the RXNE flag is set to indicate that there is data waiting. But the CTS handler is in the middle of a read-modify-write cycle. It has read the status register and is in the process of clearing the CTS flag. When it writes the result back, the RXNE flag will be cleared and the arrival of the character could go un-noticed.
OK, so I just invented all that. the point is, the peripheral registers may need care in setting and clearing bits and the safest way to do that in both cases is through bit-banding. Why? because the changes are atomic. That is, they happen to single bits in on cycle and cannot be interrupted. Thus, there will never be an occasion where you read a bit which can be modified by some other code before you get to write it back out.
Right, so, how is it done? 8051 programmers had a rich set of bit set/reset instruction that could do all this very neatly. then again, the 8051 had a tiny address space and a suitably simple architecture.
On the STM32, some magic is worked internally so that each bit in a pre-defined memory range can be addressed as another location in a kind of virtual address space somewhere else. So, for example, the 32 bit value stored at address 0×20000000 also appears as 32 sequential memory locations starting at 0×22000000. There are two regions of memory that have bit-band alias regions. First there is a 1Mbyte SRAM region from 0×20000000 – 0×20100000 where each bit is aliases by a byte address in the range 0×22000000 – 0x23FFFFFF. Then there is the peripheral space from 0×40000000 – 0×40100000 which is aliased in the same way to the range 0×42000000 – 0x43FFFFFF.
Using this scheme, a read or write to memory location 0×22000000 is the same as a read or write to the least significant bit of SRAM location 0×20000000. I have no intention of going through the whole thing.
If you want to find out more about this and many other dark STM32 secrets, read the excellent book by Joseph Yiu – The Definitive Guide to the Arm Cortex – M3
The Peripheral Library, among other sources, provides us C programmers with macros to do the address translation. They look like this for the SRAM memory space:
#define RAM_BASE 0x20000000
#define RAM_BB_BASE 0x22000000
#define Var_ResetBit_BB(VarAddr, BitNumber)
(*(vu32 *) (RAM_BB_BASE | ((VarAddr - RAM_BASE) << 5) | ((BitNumber) << 2)) = 0)
#define Var_SetBit_BB(VarAddr, BitNumber)
(*(vu32 *) (RAM_BB_BASE | ((VarAddr - RAM_BASE) << 5) | ((BitNumber) << 2)) = 1)
#define Var_GetBit_BB(VarAddr, BitNumber)
(*(vu32 *) (RAM_BB_BASE | ((VarAddr - RAM_BASE) << 5) | ((BitNumber) << 2)))
These are all well and good but not too intuitive to use. Even if you understand what they do. Rather than mess with these, I define a couple of additional versions that look like this:
#define varSetBit(var,bit) (Var_SetBit_BB((u32)&var,bit))
#define varGetBit(var,bit) (Var_GetBit_BB((u32)&var,bit))
#define varGetBit(var,bit) (Var_GetBit_BB((u32)&var,bit))
Using these macros is quite simple. The following are all legitimate ways to use them:
uint32_t flags;
uint32_t status;
varSetBit(flags,1);
varSetBit(flags,READY_BIT);
varClrBit(flags,3);
ready = varGetBit(flags,READY_BIT);
It is interesting to note that the varGetBit macro is an LValue so that it can be used in an assignment like this:
varGetBit(flags,4) = y;
varGetBit(flags,ARRIVED) = varGetBit(status,READY);
These methods are not primarily about speed but convenience. The compiler cannot know where the variables will be stored when the code is generated so you will see some of the calculations done at run time by your program. However, if you use the pointer method in the code fragment above, the calculation of the alias address is done only once and you will be able to get to the bit variables quite efficiently after that. To access bits in the peripheral registers, you use the exact same technique but with different base registers. There is, of course, no reason why you could not define a macro that refers to a specific bit in a peripheral register at a known address. Then you get the addresses completely pre-calculated by the compiler and the most efficient code since the peripheral register addresses are known at compile time. Continue reading
ARM Cortex text book
The Definitive Guide to the ARM Cortex-M3
Joseph Yiu
This is one of a very small number of books about the relatively new ARM processor, the Cortex-M3. Now in a second edition, the book covers all the essential information required to get to grips with this elegant and powerful core and concentrates mostly on the core itself. While several manufacturers, such as ST, Luminary, Atmel and Philips among others, implement Cortex-M3 based processors, they differ in the range of peripherals connected to the core. At its heart, each uses the same logic, registers and instruction set to get the job done.
Over the course of 20 chapters, Yiu examines the general arrangement of the ARM architecture, the instruction set, registers, interrupt controller, memory management, bus configuration, exception handling, programming and debugging via the built-in JTAG interface. Towards the end of the book, is a section on getting started with GNU and Keil toolchains as well as guidance on porting applications from the well-established ARM7 core to the newer Cortex-M3.
Plentiful code examples are given throughout. While these are generally in assembly language, the dedicated C programmer should not feel disadvantaged since there is ample description of the registers, their addresses and bitfields to make porting the code into C a relatively easy exercise. translation to C is, perhaps less trivial when looking at the bit-banding capabilities of the processor and here the author has thoughtfully provided examples in C to complement the assembler code.
If you are interested in really getting under the skin of this processor, this is the book to choose
You can have a look inside the book at Amazon: The Definitive Guide to the ARM Cortex-M3
STM32 USART basics
A USART is a universal synchronous asynchronous receiver transmitter. A serial port if you like. On the STM32 however, it really is universal. This peripheral has a raft of features for a huge range of serial protocols including all the usual asynchronous modes plus IrDA, LIN, Smartcard Emulation and the ability to function as an SPI port…
Typical STM32 parts have between 2 and 5 USART peripherals. The STM32F103RE is described as having 5 USART/UART devices. USART1 live on the high-speed APB2 bus while USART2, USART3, UART4 and UART5 are connected to the lower-speed APB1 bus. The UARTs differ from the USARTs in that they do not provide hardware flow control or synchronous operation or smartcard emulation. All other functions appear to be supported [RM0008 - sec 25.5].
It will come as no surprise that the USART, being a complex peripheral, has a lot of configuration options and registers. Here is the register map, taken from the Dec. 2009 reference manual:
Notice that, although the registers themselves are on 32-bit boundaries, they are no more than 16-bits wide. When you try and find where all these peripherals get connected to the outside world, it can be quite a challenge what with the sheer variety of peripherals and the finite number of pins available. Here, I shall concern myself with just USART1 on the STM32F103RE part. (code is at the foot of the page) Much the same techniques are used with any of the USARTs. This is a list of the main USART/UART pins on the STM32:
| BGA100 | LQFP100 | LQFP48 | LQFP64 | main function | USART/UART |
| D9 | 67 | 29 | 41 | PA8 | USART1_CK |
| C10 | 70 | 32 | 44 | PA11 | USART1_CTS |
| B10 | 71 | 33 | 45 | PA12 | USART1_RTS |
| C9 | 68 | 30 | 42 | PA9 | USART1_TX |
| D10 | 69 | 31 | 43 | PA10 | USART1_RX |
| G3 | 29 | 14 | 20 | PA4 | USART2_CK |
| G2 | 23 | 10 | 14 | PA0 | USART2_CTS |
| H2 | 24 | 11 | 15 | PA1 | USART2_RTS |
| J2 | 25 | 12 | 16 | PA2 | USART2_TX |
| K2 | 26 | 13 | 17 | PA3 | USART2_RX |
| K8 | 51 | 25 | 33 | PB12 | USART3_CK |
| J8 | 52 | 26 | 34 | PB13 | USART3_CTS |
| H8 | 53 | 27 | 35 | PB14 | USART3_RTS |
| J7 | 47 | 21 | 29 | PB10 | USART3_TX |
| K7 | 48 | 22 | 30 | PB11 | USART3_RX |
| B8 | 79 | - | 52 | PC11 | UART4_RX |
| B9 | 78 | - | 51 | PC10 | UART4_TX |
| B7 | 83 | 54 | PD2 | UART5_RX | |
| C8 | 80 | - | 53 | PC12 | UART5_TX |
For my purposes, I want to set up USART1 at 9600,N,8,1 with no hardware flow control. For that I only want to configure pins PA9 and PA10 as TX and RX respectively. Pins PA8, PA11 and PA12 will be left as GPIO pins.
Bus Setup
As with all the peripherals on the STM32, the first thing to do is make sure that the peripheral is getting a suitable clock signal and that the pins are properly setup. USART1 is connected to the APB2 peripheral bus and uses pins on GPIOA. Thus, we need to enable the clock for GPIOA. Since this example will only use the TX and RX pins (PA9 and PA10 respectively) we need only configure them. The TX pin, PA9 should be set up as a push-pull output using the alternate function at low frequencly (0b1010). RX is a floating input (0b0100) which is its default state although you may wish to enable the pullup/down feature if it suits your application. Lastly, the APB2 peripheral clock will need to be enabled for the USART.
BAUD rate:
A common issue with micro controllers is that the baud rate generator is a simple division of the main processor clock. That leads to ‘strange’ system clock frequencies like 4.9152MHz just to get easy divisors for the baud rate generator. The STM32 has a fractional generator that means that pretty well any baud rate can be derived from the system clock whatever its value. Each USART has a register, USART_BRR, that holds the divisor, stored as a 12.4 unsigned fixed point number. The reference manual is a bit awkward on the matter of what value to store in here but the simple answer is to calculate it from
USART_BRR = Fck/BAUDRATE
This will get you close enough although you should probably use a rounding factor and do it properly. I have no trouble with baudrates up to 115,200
So, with a clock rate, SystemFrequency, of 64,000,000 Hz and a baud rate of 115200, I need
USART_BRR = 64000000/115200 = 555.5555 = 555 truncated
(note that only USART1 can be clocked with the full system clock, the others get Fck/2)
This example generates almost a worst-case error, being wrong by half the least significant bit. The actual baud rate generated will be 115,315.3. Since the next nearest value for USART_BRR would be 556, giving a baud rate of 115107.9, this will certainly be close enough.
Register setup
Spend enough time with the reference manual and you will see that the processor puts the USART registers into a very handy state after a reset. The default settings give you no hardware flow control, 8 data bits, no parity and one stop bit – exactly what you need when talking to a common-or-garden terminal program. At present, I have no intention of using the serial ports for anything else so, I am afraid, I shall not bother to delve into the deeper mysteries of the USART configuration registers. Before the USART can be used, however, the USARTx_CR1_UE bit must bet set to enable the peripheral.
Transmit and Receive
Each USART has a single data register (USARTx_DR). This is a 9-bit register to cater for longer characters. Here only 8 bits are used. Writing to this register will put data into the outgoing shift register and reading from here will fetch the most recently received data.
Before data can be sent, the transmitter must first be enabled by setting the USARTx_CR1_TE bit in USARTx_CR1. According to the reference manual, immediately after setting this bit, an idle frame will be sent automatically. I could not observe this when repeatedly clearing and setting the TE bit. Before sending a character to the data register, you should test the USARTx_SR_TXE bit. This bit indicates that data register holds data not yet sent to the TDR shift register. There is no need to directly set or clear the TXE flag, it is cleared when data is written to USARTx_DR and set when that data is transferred to the TDR. An interrupt can be connected to this bit if you want to be sending data under interrupt control.
If you write to USARTx_DR when the shift register is empty, the data will go straight into the shift register, transmission will begin immediately and the TXE flag will will be immediately set.
After sending the last character in a string, it will be a good idea to test the USARTx_SR_TC bit. This bit will be cleared after transmission of the last frame and thus indicates that it is safe to shut down the USART without data loss.
Getting hold of the received data is a simple matter of reading the same data register (USARTx_DR) that is used to send. The flag, USARTx_CR1_RE, must be set to enable the receiver. When a character is received, the USARTx_SR_RXNE bit will be set indicating that data is waiting in the data register (USARTx_DR). An interrupt can be generated if suitably enabled. Reading the data register will clear the RXNE flag automatically. When reading a character, it is a good idea to read the error flags from USARTx_SR. For basic asynchronous operation, the key flags are located in the least significant 4 bits. They are:
- USARTx_SR_ORE: Overrun Error
- USARTx_SR_NE: Noise Error
- USARTx_SR_FE: Framing Error
- USARTx_SR_PE: Parity Error
Code Examples
Right, let’s see some simple code:
void usartSetup (void) {
// make sure the relevant pins are appropriately set up.
RCC_APB2ENR |= RCC_APB2ENR_IOPAEN; // enable clock for GPIOA
GPIOA_CRH |= (0x0BUL < < 4); // Tx (PA9) alt. out push-pull
GPIOA_CRH |= (0x04UL << 8); // Rx (PA10) in floating
RCC_APB2ENR |= RCC_APB2ENR_USART1EN; // enable clock for USART1
USART1_BRR = 64000000L/115200L; // set baudrate
USART1_CR1 |= (USART1_CR1_RE | USART1_CR1_TE); // RX, TX enable
USART1_CR1 |= USART1_CR1_UE; // USART enable
}int SendChar (int ch) {
while (!(USART1_SR & USART1_SR_TXE));
USART1_DR = (ch & 0xFF);
return (ch);
}int GetChar (void) {
while (!(USART1_SR & USART1_SR_RXNE));
return ((int)(USART1_DR & 0xFF));
}
Yes, it really is that simple in the end. Of course, none of the clever tricks that the USART specialises in have been used. Maybe some other time.
ARM STM32 JTAG
JTAG is a common standard for communicating with modern electronic devices like FPGAs and microcontrollers. A JTAG connection will allow you to do in-circuit debugging in a bewildering variety of ways and will generally allow you to program your device. The standard, apparently, defines five connections for this purpose. Add in power and ground and you have a minimum of 7 connections needed to implement JTAG. The trick is getting them delivered to your board or device…
There are some standards in JTAG connections but, like standards everywhere, the nice thing is that there are so many to choose from. Since I am presently only interested in the use of the STM32 processor, which has an ARM cortex-M3 core, with Rowley Crossworks, I shall only describe what works for that. It should also work for other ARM processors and other development tools. However, you must check the actual pin allocations on your interface device and the target before getting all carried away with the soldering iron.
For ARM processors, the only real standard for JTAG connectors would appear to be the 20-pin DIL version:
This is normally found on a target as a 20 pin 0.1” DIL box header for use with standard 0.05” ribbon cable and IDC connectors. It is a simple, robust connection that is easily implemented and far too big for a modest embedded application. I shall shortly want to be making a small device that is only 50mm x 40mm. The box header for this connection would require 30mm x 10mm – 15% of the board space. There appears to be another, slightly smaller connector in use – the 14-pin DIL:
This is a distinct improvement and manages to save space by removing the optional connection. It is still quite bulky though so let’s have a look at the pins that are used. The first thing you notice is there are lots of ground pins. This is a good thing generally. On the 20-pin connector, it means that all the data lines have a ground line between them in the ribbon cable. That will help to ensure signal quality over what is a very high speed bus. However, if you were to use a 20-pin connection to a small adaptor placed very close to the board, you could compromise a little over the last inch or two. A 10-pin connector is described in several places but there seems to be no obvious consensus for what the pin connections should be. Consequently, I have simply used the same pinout as that chosen by Harjit Singh for his micromouse so that it will be a little simpler for us to share hardware and experience:
Harjit’s connector has the great merit that it can be plugged in the wrong way round with no ill-effects on the target. This is particularly handy since it will permit the use of a simple pin-header without the bulky cable shell and keying slot normally used. The cable headers still take up quite a lot of space but we are now down to only 18mm x 8mm of board space needed. Looking some more at this, there are still two surplus pins. An 8-pin version would still carry the necessary signals:
If this were plugged in the wrong way round, it could damage the target with the TDO signal possibly being held well above the target supply rail. Consequently, I would not recommend this without some kind of keying or other constraint to make sure it did not get connected in correctly.
Use with Crossworks
You may have noticed that one of the signals dropped in the smaller connector is RTCK. This signal can be used to dynamically controll the speed of the JTAG interface. In the target debuffer settings in Crossworks, you will want to turn off this option is the RTCK signal is not used. Look in the target properties window for an entry that says ‘Adaptive Clocking’. Set this to be ‘No’ if you are not using RTCK in your connector.
Rowley’s Crossworks does not use the nTRST signal for debugging. This line resets the JTAG hardware. Don’t leave it out of the connector in case you use some other software to talk to your target but be aware that Rowley do not make use of it.
Use with STM32
Be aware that the STM32 has internal pullups and pulldowns where needed on the JTAG lines. Another thoughtful provision by those nice people at ST. Other ARM processors are not as convenient and you should arrange to pull up or down the lines as appropriate:
- TMS, TDI, TDO, nSRST and nTRST should have pull-ups of 10k normally
- TCK, RTCK, DBGRQ and DBGACK should have a pull-down of 10k normally
Note that nSRST is the system reset and is normally connected to the RST line on the target processor. Usually, you should not connect the nTRST and nSRST lines. I don’t think it will make a difference for the STM32 and Crossworks but certainly will for a Segger JLINK.
As far as I can see, the STM32, and Cortex-M3 in general, do not use either the RTCK connections anyway.
Finally, here is the 10-pin connector implemented on a breadboarded STM32 target:
Posted in STM32
8 Comments
Crossworks projects startup and debugging
Crossworks or, more accurately, CrossStudio for the Arm, running on a mac is probably one of the better development environments. It has its quirky side but, so far, I am really happy with it. Now might be a good time to look at how projects are organised, how the code gets onto the target and how it is started up…
Crossworks allows the user to work on projects which form part of solutions. Projects appear in the project explorer as if they were a bunch of folders. Actually, this is a representation of the information held in an XML file which describes the project. This file can be edited in a text editor if you must or changes made in the project explorer and/or configuration view are written back to it.
This is noteworthy for a couple of reasons.
First, the ‘folders’ in the project view don’t actually exist on your computer. If you create a folder for, say, documentation or library files, CrossStudio will put the files in the project folder itself and will not create a folder with the same name.
Second, the files seen in the explorer need not be in the project folder at all. Have a close look at the properties of these files and you will see some of them are stored elsewhere. these entries are, then, pointers to the actual files. An obvious example of this is the set of files found in the ‘system files’ folder. If you look at the properties of the startup file for example, you will find it actually lives in the Library folder under your home directory. This is very convenient for when you want to use a common file like the startup code. However, you must be aware that editing one of these files will affect every project that points to it and that may not be good. Almost certainly won’t be good in fact. If you want to maintain a copy of the shared file in the project folder that is different to the original, you can import the file by right-clicking its name in the project explorer. That will make a copy and disconnect you from the original.
Third, The project description carries only relative location information for the local files in the project. /that means that you could, for example, make a copy of the project folder, complete with its .hzp file and, when you open that, you can be sure you are looking at files in the copy not the original. Some other IDEs hold absolute file locations and you can find yourself merrily editing what you think is a copy but is actually the original in a different project.
Programs built for STM32 targets will normally all have three files in the ‘system files’ folder. These are:
- STM32_Startup.s
In here is pretty much just a table of interrupt vector handlers. These are declare using the .weak keyword which means that if the same identifier is used elsewhere for a function, then that definition will replace the one named here. For example, you only need define a function called SysTick_Handler for the linker to make that the actual systick handler instead of the default one declared here.
Also in this file is a small code segment to set up the stack pointer and to call the function SystemInit(). This is node immediately following a restart, after the stack pointer has been given a value. Again, this is declared weak and a default, do-nothing SystemInit() exists in this file in case you don’t write one. Remember this. If you write a function called SystemInit(), it will get called before the processor gets to do almost anything else. There is, then, no need to call SystemInit() yourself – the startup code will call it for you. Equally, don’t create a function called SystemInit() if you don’t want it called early.
A confusing thing happens in this file. The presence of a preprocessor macro is tested and the compiler either generates a jump to the ‘proper’ reset handler or the system goes into a loop expecting the processor to be started by the JTAG interface. More on this below.
Both these features are described in a comment block at the top of this file. - STM32_Target.js
This is a JavaScript file that does not produce code for the processor. Instead, it determines how the IDE starts the processor up depending on the configuration in use. - thumb_crt0.s
This is the more traditional C startup code that you might have expected. In here the variables are initialised, the stacks set up and a jump is made to the application entry point. Generally, this will be the function called main(). If your main() function is allowed to exit, it will return here and go into an endless loop.
Startup from reset.
This is a frequently asked question on the Rowley site. Why will my application not start from a power-on or reset even though it runs just fine when started by the debugger? The answer, apparently, is that there is a conditional compilation section in the STM32_Startup.s file. the compiler looks for a preprocessor macro called STARTUP_FROM_RESET. The actual code looks like this:
_vectors: .word __stack_end__ #ifdef STARTUP_FROM_RESET .word reset_handler #else .word reset_wait #endif /* STARTUP_FROM_RESET */
The very first entry in the vector table is the address to jump to on a processor reset. If the STARTUP_FROM_RESET macro is defined, the location is the ‘proper’ handler, otherwise, the location points to a simple endless loop. The JTAG debugging interface can break the processor out of this loop and set it on its path by jumping directly to the entry point. Forgetting to set this macro is an annoyance. You will get your code working fine and then hit reset or disconnect the JTGAG only to find the processor apparently crashed. You can find out if this is why it is stuck relatively easily. Connect the JTAG debugger and power up or reset the processor. Assuming nothing seems to happen, turn to the IDE, connect to the debugging device and select Target | Attach Debugger. Once the debugging view comes up halt the processor and you will probably find yourself looking the the startup loop. If you are not then something else is killing your code. I find this very irritating and keep forgetting. Even when I remember, I seem not to reliably be able to start the program without the debugger. I suggest that any hardware you build has at least one LED on it that only lights when your code is running so that you get an immediate visual confirmation. The thing is that, since the debugger can always run your code, it always looks OK when you re-connect the debugger to see what went wrong.
Pretty well all the STM32, and general ARM systems you come across are likely to jave a JTAG connector on them somewhere. The standard ARM JTAG connector is a 20 pin 0.1″ pitch box header. This may be all well and good on larger projects but, since I intend to put a STM32 processor on a robot measuring only 50mm x 36mm, it is a bit big. As far as I can tell, the least number of pins needed for STM32 JTAG is 7:
- TMS – Test Mode State – use 100k pull-up to VCC.
- TDO – Test Data Out.
- TDI – Test Data In – use 100k pull-up to VCC.
- TCLK – Test CLocK – use 100k pull-up to VCC.
- RTCK – JTAG Return Test ClocK (optional)
- VCC – Positive Supply Voltage
- GND – Digital ground.
- RESET – /RSTIN — active low reset input of the target CPU.
The RTCK can be done without so long as the Adaptive Clocking option is turned off in the debugger properties of CrossStudio. It might be needed by other hardware. The VCC connection is there to allow the target to power the JTAG hardware an so should be optional. This brings the number of required pins down to 6. However, at least on the Olimex ARM-USB-OCD JTAG adaptor, this is not sufficient to allow it to work. Presumably because something inside uses the target VCC. Never mind, there are small 10 pin connectors that can be used.
A standard exists for 10 pin headers to implement this set of connections:
http://www.keil.com/support/man/docs/ulink2/ulink2_su_connecting.htm
Luminary have a 20-pin to mini-10-pin adaptor:
http://www.luminarymicro.com/products/mdl-ada2.html
It would not be hard to make up an adaptor and, once you have gone down the DIY proprietary route, it might as well be an 8-pin adaptor.
Posted in STM32
7 Comments
Crossworks Blinky Project 3 – PLL and HSE
Last time, the STM32 was set up to use the Internal RC oscillator, HSI. This runs at 8MHz. The PLL multiplier was told to use the HSI/2 as its input and the multiplier value was set to x 5. The result being a 20MHz system clock. Now, what happens if the PLL multiplier is increased to make the system run faster…
It breaks is the short answer.
In the manual, we find that the flash memory is not fast enough to keep up with the processor clock if the clock is too high. Wait states can be added to flash memory accesses to allow the pre-fetch buffer to keep up. The buffer is turned on by default after a reset so you don’t need to do anything special there. The guidance is:
- zero wait state, if 0 < SYSCLK < 24 MHz
- one wait state, if 24 MHz < SYSCLK < 48 MHz
- two wait states, if 48 MHz < SYSCLK < 72 MHz
With SYSCLK at 20MHz, everything works just fine. The following code fragment:
volatile int i;
while(1){
GPIOA_BRR = (1<<5);
for(i=0;i<1000;i++);
GPIOA_BSRR = (1<<5);
for(i=0;i<1000;i++);
}
will generate pulses on the output pin at 950Hz. Remember to make sure all optimisations are off or the for loops may get optimised out. Declaring i as a volatile int forces the compiler not to optimise the loops away in case something else it does not know about changes the value unexpectedly.
Changing the PLL multiplier from x5 to x10 should put SYSCLK at 40MHz where we are supposed to need one wait state. In fact, the program runs just fine with the output toggling at 1900Hz – exactly twice what it was before just as you might expect. Like any overclocking, I guess it is not sensible to operate the device outside its specification if you want it to carry on working, or for the same code to work across a range of devices.
I tried increasing the multiplier, one step at a time until the program failed to work. It was just fine with the multiplier all the way up to 14 which would set the system clock to 56MHz. The output here as 2600Hz so everything was scaling perfectly. After that point it simply would not run. The debugger indicated that as soon as the PLL was turned on, the processor just lost its grip on reality and spun off into the aether. Or something like that.
OK, so what happens when we add wait states? This is very easily done with a single line of code. the following fragment will add two wait states. Substitute the 2 for a 1 or a zero as needed.
FLASH_ACR = FLASH_ACR_PRFTBE | 2<<FLASH_ACR_LATENCY_BIT; // 2 wait states
With the PLL put back to a variety of values to set SYSCLK at frequencies between 20 and 52 MHZ, the same LED flashing code was run with zero, one and two wait states and the frequency at the LED pin measured.
| SYSCLK | 0 WS | 1 WS | 2 WS |
|---|---|---|---|
| 16MHz | 760 | 665 | 569 |
| 28MHz | 1330 | 1160 | 997 |
| 40MHz | 1900 | 1660 | 1424 |
| 52MHz | 2468 | 2160 | 1850 |
| 64MHz | —- | 2660 | 2280 |
The strikethrough indicates results that are outside the parameters given in the ST manual. The thing to notice that using 2 wait states in flash will slow your program to about 75% of its speed without any wait states. That means that, if you were to increase your system clock by a small amount and take it over one of the thresholds, you may find that your code runs slower than it did before if you choose to insert the appropriate number of wait states.
Code on the ARM can run from RAM which is supposed to be faster than running from flash. There must be some additional trick to this because, when I tried the exact same code compiler to run out of flash, it performed at almost exactly the same speed as with 2 wait states in flash.
The High Speed External oscillator (HSE) is generally connected to a crystal or a resonator. This is good when ever you want to have a system clock or other timebase derived from it that cannot readily be made from the internal 8MHz oscillator (HSI). This may be needed if you are dealing with ethernet or USB or TV signals or something like that or if you need a particular stability or synchronisation with another clock source. If you don’t need an odd frequency, you might just as well stick with the HSI clock. On my test board, the internal clock is just as accurate as using an external crystal at room temperature. I don’t know how temperature dependent it is but then I don’t really care for my toys. Turning on the HSE clock is relatively simple:
RCC_CR |= RCC_CR_HSEON; // turn on the HSE while ((RCC_CR & RCC_CR_HSERDY)==0); // wait for it to become stable RCC_CFGR |= (1 << RCC_CFGR_PLLXTPRE_BIT); // pre-scale the HSE by 2 RCC_CFGR |= (1 << RCC_CFGR_PLLSRC_BIT); // use the HSE as input to the PLL
There is no need to pre scale the HSE by two before feeding it to the PLL. I put in the line to show how and so that the frequency presented to the PLL is the same on my board for both the HSE and HSI inputs since my HSE is connected to an 8MHz crystal.
Again, if you don’t need a source clock frequency different to 8MHz, there seems little need to use HSE at all. The HSI input can be used to make SYSCLK as high as 64MHz. Using the HSE extends that to 72MHz – not enough of an increase to make up for having to use two wait states at those frequencies. If your system isn’t fast enough by then, you may be using the wrong processor or algorithm.
Note that this code is probably not a good example for testing the performance as I have no idea how it interacts with the operation of the pre-fetch buffer. I would expect that performance issues to buffer wait-states would not get much worse than this.
Now I can set the main clock up, drive output pins and read input pins.
Next time, a little bit about Crossworks, JTAG, startup files and running without the debugger.
Add to Google