- Steel and stainless steel
- Aluminium
- Titanium
- Composites
- Plastics

3D CAD Design

- 3D modelling
- Photo realistic rendering
- Joint design
- Design optimisation


- Hand analysis
- FE analysis
- Fatigue analysis

Systems Engineering

- Specifications
- Requirements
- Testing & verification
- Documentation

IP Protection

- Patent process
- Prior art
- Costs and benefits


ARM Microcontroller bare metal programming

This is article summarises some of my experiences programming ARM Cortex M3 and M4 microcontrollers without an operating system. This guide is intended for the beginner, probably with some C programming experience.

Why bare metal and not use an operating system?


In my opinion bare metal programming is slightly more basic. The advantage is that you “only” have to learn how the microcontroller works. If you use an operating system some of the microcontroller issues are hidden but instead you have to figure out how the operating system works. If you want to do something that is not enabled by the operating system then you have to go back to understanding the processor as well. At this stage the operating system route becomes very complex. I would say that after you figured out the processor it is much easier to figure out how to fix an operating system and add features. I would therefore start with bare metal programming.

I would say you exceed the limits of bare metal programming if you require a file system to read an SD card or if you need a wired internet connection. USB is also very complex. If you need these features a real time operating system with these features built in would make sense. It is possible to implement Ethernet without an operating system but this is definitely beyond the beginner.


Select your tools

If you have never played around with Linux and don’t know what a makefile is then I would start with a commercial development board and software package. Have a look on the ST, Keil, IAR, Hitex and Raisonance websites. Be aware that in some cases you have to buy an additional debugger to transfer the compiled code from your computer onto the development board. Other boards have the debugger built in. For basic programming the free version of the software kits is good enough. It will be a while until you reach the code size limit.


With these kits you should be able to have the example programs running in minutes. Load the example, compile and load (flash) the binary file onto the development board. If you decide to use a ST board you will have to use one of the other development programs since ST doesn’t provide one. ST does provide a lot of examples which are tailored to the Keil, IAR and other software tools.


If you do have Linux experience I would use open source tools. As a minimum you will have to install a GCC cross compiling tool chain and some sort of software to flash the microcontroller. I’m using the Mentor Graphics Sourcery CodeBench Lite edition. There is also YAGARTO (yet another GNU ARM toolchain). I would also download the Olimex OpenOCD_OnlinePackage. It contains makefiles, linker scripts, startup code and peripheral libraries. It also comes with examples. If everything has been set up correctly you only have to type “make” at the command line and the sample projects should compile.


Once this is all working you can also install the ECLIPSE software development kit. It is state of the art but will take some effort to set up. It is entirely possible to edit the source files with a regular text editor.

Basic structure of a program

What do you actually have to do to the micro controller so that these programs run? As a first step you will have to download the reference manual for your processor from the net and the schematic of your board. The reference manual is usually lengthy (around 1000 or more pages) and describes every peripheral of the processor in detail. Don’t worry. You don’t have to read it all. You only have to get your head around the peripherals you are using. If you can’t find the manual don’t use the processor. Some of them require a NDA or don’t describe the features of the processor fully. This is not a problem with lower power processors which are very well documented.

All processor level programming is done by setting bits in registers to 0 or 1. For example the following line sets the control register of Port D to 0x44BB44BB or 1000100101110110100010010111011.


GPIOD->CRL = 0x44BB44BB;


The reference manual will tell you what this means.


Most of this setting up is done using hexadecimal numbers. I often use a calculator to convert hex to binary or the other way round. The standard Windows calculator has a useful “Programmer” mode for this. After a while you will be an expert at converting binary to hex and back.


Quite often you will see operators like |=, &=, ^= or << when assigning values to registers. They are usually used when there are already values written to the register but the register has to be modified. How do they work?


|= is a bitwise or assignment and is usually used to set bits (set them to 1). For example if previously  0x44BB44BB has been written to the Port D control register and one would like to set bit two without changing all the other bits one would write:


GPIOD->CRL | = 0x00000004;


After this operation the register value would be 1000100101110110100010010111111.


Another way of achieving this is to use the left shift operator to shift a 1 to bit position 2. This is done the following way:


GPIOD->CRL | = (1 << 2);


&= is the bitwise and assignment and used to clear bits. To change bit two back to 0 without changing any other bit one would write:




Type these hex numbers into a calculator and convert them to binary. You will quickly understand what is happening.


^= is the exclusive or assignment and used to toggle bits. If the bits were on they are switched off and if they were off they are switched on. If one was sure that bit two of the Port D control register is currently 1 but it has to be switched off one could write:


GPIOD->CRL ^= 0x00000004;


Now that we know how to modify registers we can look at what a regular program actually does.


Below are the basic steps required in order for programs to run. It is all very basic low level stuff far removed from higher level programming.



 Set up the clock tree

Every microcontroller needs a clock and this has to be set up. There are usually different clock sources, clock paths, clock dividers and multipliers. Usually the processor is clocked from a higher precision external crystal in run mode.


The good thing is that the clocks are usually set up by a standard system file which comes with the examples. As long as you include the standard system file in your code the clocks should work. You still have to find out what the frequency of certain peripheral clocks is if you want to program the correct baud rate for a serial port for example. Therefore I would go through the system file, find out how the clock signal is routed and what the resulting frequencies are.


2.      Set up pin multiplexing

Most processor pins can be used for more than one function. For example one pin might be used as a CAN bus data line or a USB data line depending on to which peripheral the pin is connected. You have to consult the schematic and the reference manual to find out if the particular peripheral is used with the default pins or re-mapped to other pins. Sometimes there is more than one re-map option.


3.      Set up the pin mode

Every pin has to be configured as input, output or analogue. Again the reference manual should tell you what the requirements of the particular peripheral pin are and how to configure the pin correctly.


4.      Enable the clock

In order to use a pin the clock of the particular port, the peripheral clock and if required the alternate function clock has to be enabled.


At this stage the pin should be able to send or receive signals and the signals should be routed inside the processor to the correct peripheral.


5.      Set up the peripheral

Most of the peripheral systems have multiple configuration options as well. The options have to be enabled or disabled as required. Again this is done by writing bits into control registers. The reference manual will give details.


At this stage your peripheral should be able to send and receive data. For example to send an ‘A’ (hex code 0x41) over serial port 1 one would write something like:


USART1->DR = 0x0041;  



Standard Libraries

The processor and peripheral set up can be done by standard libraries written especially for the particular processor. If you are choosing a processor download the available libraries first to see what is available. The better the libraries are, the easier it is to develop software. With libraries you don’t have to worry about writing to special registers. This is done by the functions in the library. For example the code below sets the options for USART1 which is equivalent to step 5 above.


USART_InitStructure.USART_BaudRate = 115200;

USART_InitStructure.USART_WordLength = USART_WordLength_8b;

USART_InitStructure.USART_StopBits = USART_StopBits_1;

USART_InitStructure.USART_Parity = USART_Parity_No;

USART_InitStructure.USART_HardwareFlowControl = USART_HardwareFlowControl_None;

USART_InitStructure.USART_Mode = USART_Mode_Rx | USART_Mode_Tx;


USART_Init(USART1, &USART_InitStructure);


It is still important to know what is going on here so understanding the above steps is essential even if it is painful.


With this knowledge you should be able to write workable software where most of the work is carried out in a big, endless loop. To receive data over a serial port there would be an ‘if’ statement that checks if data has been received. If there is data this data is processed. After the data has been processed a button is checked for its status for example. All this happens in a big loop.


Advanced Functions

Big loops quickly get out of control and there might be large delays due to the amount of processing that is going on. For example if the loop is too big data from the serial port might be missed because the characters arrive at a higher frequency than the frequency of the loop. The frequency of the loop might also change depending on the work that is done. So this is not a very reliable way to program a microcontroller. This brings us to advanced functions.



In most cases the endless loop can be completely empty if interrupts are used. The main function does nothing at all.


Interrupts are triggered if certain conditions are met (button is pressed, character arrives on serial port). If an interrupt is triggered a special function is executed depending on the priority of the interrupt. If two interrupts are triggered at the same time the one with the higher priority is dealt with first.


The interrupt handler function is usually quite short and very descriptive. Complex programs can have multiple interrupts. All “work” is carried out in these interrupt handler functions. Below is an example for a serial port byte received interrupt.


void USART2_IRQHandler(void)




    /* Read one byte from the receive data register */

    RxBuffer[RxCounter++] = USART_ReceiveData(USART2);


    if(RxCounter == NbrOfDataToRead)


      /* Disable the USART2 Receive interrupt */






Interrupts can be triggered by peripherals (data received, transmit register empty, error…), external lines (buttons) or software.

To use interrupts the interrupt controller channel has to be enabled. Usually one doesn’t have to change the priority from the default priority. If all interrupts have the same priority they are executed in the order they are received. It gets complicated if you would like to stop execution of an interrupt routine if a higher priority interrupt has been received.


In the peripheral control register the “enable interrupt” flag also has to be set. After that the function which is defined in the startup file is executed whenever an interrupt has been received.



DMA stands for direct memory access. It is very useful for transferring data without loading the processor.

Imagine you want to transfer 50 characters over a serial port. First you would have to check if the transmit register is empty. If it is empty the next character is copied into the register until all characters have been transmitted.

Without a “transmit data register empty” interrupt the processor would be busy transmitting these 50 characters from the start of the transmission to the end of the transmission. This would be a great waste of resources because most of the time the processor is waiting for the transmit register to be empty.


Using an interrupt is a much better option. The “transmit data register empty” interrupt would be executed 50 times. Between interrupts the processor could do other jobs because no waiting for an empty register is required.


The best way to transmit those 50 characters would be to use a DMA. The DMA would automatically transfer the next character from memory to the transmit register whenever the transmit register is empty. To use the DMA one would have to do the following steps:


1.      Set up the source address for the DMA (the address of the first character to transmit)

2.      Set up the destination address for the DMA (address of the serial port data register)

3.      The number of characters to transmit

4.      Some DMA options. In this case the source address would be automatically incremented after each character transmission and it might be useful to enable an interrupt after the DMA has finished transferring the data. In the interrupt routine the DMA is switched off.

5.      Switch on the DMA

6.      Enable serial port DMA transmission


After the DMA has been set up the processor has nothing to do until the DMA has finished at which stage the DMA is switched off by the interrupt routine. The processor could do other jobs during the whole period of the transmission. We are almost multi-tasking!


One of the dangers of using a DMA is that it would be possible to change the transmit buffer contents (source data) while the DMA is transmitting from it. This would obviously lead to corrupted data being transmitted. Therefore before writing to the transmit buffer check if the DMA is enabled or not!


Bit Banding

Bit banding maps a complete word (32 bits) to a single bit in the bit band region. On the Cortex M3 processor the bit band region covers the whole SRAM.


So what is this good for? It significantly speeds up and simplifies checking for set bits and manipulating single bits. In the Port D control register example we had to use bitwise assignments and operators to modify bit 2. Instead one could just write 0 or 1 to the mapped word address.

To set this up one would first define the variable and the word address which is mapped to the particular bit of interest. For example the DMA1 channel 6 enable bit is mapped to address 0x42400FEC.


#define DMA1CH6_EN         ((DMA1CH6_EN_Type *)   0x42400FEC)


To enable the DMA one would simply program


DMA1CH6_EN = 1;


To disable the DMA the code is


DMA1CH6_EN = 0;


Often there are horrible if statements like this one which checks if a certain bit is set or not. In this case it checks of the CAN1 transmit mailbox 0 is empty.




With bit banding this could be changed to


if (CAN1_TSR_TME0 == 1)