Keywords: high speed microcontroller, DS80C390, DS80C400, CAN, controller area network, math accelerator, math acceleration, 16-bit math, special function registers, SFR, 16-bit multiply, 32-bit divide, code example, Dallas Semiconductor, microcontrollers
Accelerating 16/32-Bit Math Operations with the DS80C390/DS80C400
Abstract: The Maxim DS80C390/DS80C400 high-speed microcontroller offers the end user a dedicated hardware 16/32-bit math accelerator. Accessing the math accelerator is accomplished by using five dedicated special function registers. 16-bit multiply and 32-bit divide operations are realized with the DS80C390/DS80C400. This application note provides the designer with helpful information about the math acceleration features of the DS80C390/DS80C400 high-speed microcontroller and various code examples.
Overview
Performance advances in the 8051 microcontroller market have increasingly led users to consider 8-bit devices in applications that once required processing power only available from 16-bit devices. Some of these potential applications, unquestionably, require fast mathematical operations and computations involving 16-bit data. The DS80C390/DS80C400 addresses this need on two levels. This faster instruction execution generates peak performance of 10MIPS and 18.75MIPS respectively for the DS80C390 and DS80C400. This faster instruction execution, coupled with a 40MHz maximum clock frequency, generates a peak performance of 10MIPS. Equally important to applications doing intensive 16-bit math is the inclusion of dedicated hardware for 16/32-bit math acceleration. This application note will explain the interface and operation of the on-chip math accelerator, and give some examples of its use.
Accessing the Math Accelerator Hardware
The math accelerator is entirely controlled through five special function registers (SFRs). These five SFRs are shown in Table 1. The loading and unloading of data to/from the accelerator is done in a byte by byte fashion through the MA, MB, and MC registers. The MA register allows transfer of 32-bit data to/from the accelerator. The MB register allows transfer of 16-bit data to/from the Accelerator, while the MC register gives load/unload access to a 40-bit accumulator. The 40-bit accumulator is updated on every multiply or divide operation executed by the accelerator and proves immensely useful in applications needing a multiply-accumulate or divide-accumulate function. Loading data into the Accelerator through any of the MA, MB, or MC registers must always be done least significant byte first while unloading data from the Accelerator always occurs most significant byte first. The block diagram of the math accelerator given in Figure 1 illustrates this SFR interface.
Figure 1. Math accelerator block diagram.
Table 1. Math Accelerator Special Function Registers
SFR (Addr)
Bit NAME(S)
Accelerator Usage By Operation
(MUL, DIV16, DIV32, NORM, SHIFT)
OPERATION
BEFORE
AFTER
MCNT0 (D1h)
MCNT0.7
Active-low LSHIFT
SHIFT
0 = left, 1 = right
MCNT0.6
CSE
SHIFT
1 = enables circular shift
MCNT0.5
SCE
SHIFT
1 = include SBC in shift
MCNT0.4-0
MAS4:0
SHIFT
Number of shifts to do
NORM
00000b = start NORM
Number of shifts done
MCNT1 (D2h)
MCNT1.7
MST
ALL
Busy Bit
Busy Bit
MCNT1.6
MOF
MUL
1 = product > FFFFh
DIV 16/32
1 = divide by 0 attempt
MCNT1.5
SCB
SHIFT
Carry Bit
Carry Bit
MCNT1.4
CLM
ALL
Clear MA, MB, and MC
Clear MA, MB, and MC
MA (D3h)
MUL
16-bit multiplicand
32-bit product
DIV16
16-bit dividend
16-bit quotient
DIV32
32-bit dividend
32-bit quotient
NORM
32-bit data
32-bit mantissa
SHIFT
32-bit data
32-bit shifted result
MB (D4h)
MUL
16-bit multiplier
DIV 16/32
16-bit divisor
16-bit remainder
MC (D5h)
MUL DIV16 DIV32
40-bit accumulator
40-bit accumulator
Supported Operations
The DS80C390/DS80C400 math accelerator supports four basic operations: multiply, divide, normalize, and shift. The operation to be performed by the math accelerator is defined by the sequence in which three registers (MA, MB, and MCNT0) are written. The BUSY bit (MCNT1.7) indicates when an operation has started (BUSY = 1) and when the operation completes (BUSY = 0). Each math accelerator operation, however, is guaranteed to complete within a fixed number of machine cycles, eliminating the need for BUSY bit polling. The table below gives the hardware execution time for each accelerator operation. Each math accelerator operation and SFR write sequence needed to initiate the operation is described after the table. This information can also be found in the DS80C390 User's Guide Supplement or High-Speed Microcontroller User's Guide: Network Microcontroller Supplement.
Table 2. Execution time of accelerator operations.
OPERATION
EXECUTION TIME
MIN EXECUTION TIME (tCLCL = 25ns; 40MHz)
Multiply 16-bit x 16-bit
24 tCLCL
600ns
Divide 32-bit/16-bit
36 tCLCL
900ns
Divide 16-bit/16-bit
24
tCLCL 600ns
Normalize 32-bit
36
tCLCL 900ns
Shift 32-bit
36
tCLCL 600ns
Multiply 16-Bit x 16-Bit
The 16-bit by 16-bit multiply operation is initiated by writing the 16-bit multiplier to the MB register, followed by writing the 16-bit multiplicand to the MA register. Each 16-bit word must be loaded least significant byte first to the required registers. The accelerator hardware completes the multiply operation within six machine cycles producing a 32-bit result which is accessible, most significant byte first, by reading the MA register. Four reads of the MA register are necessary to acquire the entire product. The multiply operation automatically accumulates the product with the previous contents of the 40-bit accumulator and sets the MOF flag (MCNT1.6) if the product exceeds 0000FFFFh.
Divide 32-Bit/16-Bit and Divide 16-Bit/16-Bit
The 32-bit divide by 16-bit operation is initiated by writing the 32-bit dividend to the MA register, followed by writing the 16-bit divisor to the MB register. The 16-bit divide by 16-bit operation is initiated similarly except that two fewer bytes need be written to the MA register for the 16-bit dividend. All 32-bit double word and 16-bit word data must be loaded least significant byte first to the required registers. The math accelerator completes the 32/16 division within nine machine cycles and the 16/16 division within six machine cycles, generating a 32-bit or 16-bit quotient depending upon the size of the initial dividend. A 16-bit remainder is generated for both divide operations. The quotient and remainder are accessible, most significant byte first, by reading the MA and MB registers respectively. For the 16/16 divide, two reads of the MA register will return the 16-bit quotient while the 32/16 divide requires four reads of MA in order to get the full 32-bit quotient. It does not matter whether the quotient or remainder is read first. The divide operation automatically accumulates the quotient with the previous contents of the 40-bit accumulator and sets the MOF flag (MCNT1.6) if the divisor was 0000h.
Normalize 32-Bit
The 32-bit normalize operation is initiated by writing a 32-bit double word to MA and then writing the MAS4:0 (MCNT0.4-0) bits = 00000b. The normalization will complete within nine machine cycles. At that time, the left shifted 32-bit result can be read, most significant byte first, from the MA register. The number of left shifts needed to normalize the 32-bit double word will be returned to the MAS4:0 bits.
Shift Right/Left 32-Bit
The 32-bit shift operation is initiated by writing a 32-bit double word to the MA register, followed by a write to the MAS4:0 bits of the number of shifts to execute. The bit shifting can optionally be done in a right or left direction, be defined as a circular shift, and be inclusive or exclusive of the carry (SCB) bit. When circular shifting is not performed, zero bit data (0) will always be shifted in during the operation. The figure below details the controls for the shift operation.
Figure 2. Shift operation control.
40-Bit Accumulator
The 40-bit accumulator adds the result of each multiply (product) or divide (quotient) to its current
contents. The entire 40-bit accumulator can be loaded, least significant byte first, with five writes to the MC special function register. Similarly, the 40-bit accumulator can be read, most significant byte first, with five reads of the MC special function register.
Assembly Code Examples of Each Accelerator Operation
Below are simple code examples to demonstrate the required load/unload procedure for each operation and diagrams to highlight the registers and bits involved in each.
As discussed earlier, the math accelerator hardware is controlled entirely by the sequence in which the associated SFRs are written and read. Each register read or write points to a different physical memory location. To achieve correct results, it is critical that the prescribed order not be violated. As a rule, the math accelerator should not be interrupted by another task that also uses the math accelerator, as this will generally produce undesired results. Simply put, there exists only one math accelerator capable of performing one operation at a time. The sequence below demonstrates the type of problem to be expected when using re-entrant math accelerator code.
Example Problem Sequence
Write MB (Start of divide 16-bit/16-bit)
Write MB
--- Interrupt occurs that uses the Accelerator ---
Write MB (Start of divide 16-bit/16-bit)
Write MB
Write MA
Write MA
--- Wait for completion---
Read MA
Read MA
Read MA
Read MA INCORRECT ! - divide 32-bit/16-bit was performed
---Return from Interrupt ---
Write MA
Write MA WRONG STATE ! - will not initiate the divide
Keil™ C51 Compiler Math Functions that Use the Accelerator
Many 8051 users are accustomed to doing code development in a high-level language such as C. Keil
Software, an industry leading provider of 8051 development tools, has created special code to allow
utilization of the DS80C390/DS80C400 math accelerator for certain operations. Listed below are the operations supported by Keil C51 version 6.20 (or later) that make use of the math accelerator when this "Target Option" has been enabled.
unsigned long *
unsigned long /
unsigned long >>
unsigned long <<
signed long *
signed long /
signed long >>
signed long <<
Application Example:
IEEE® 754 Single Precision Floating Point Multiply
To demonstrate the capability of the DS80C390/DS80C400 math-accelerator hardware, let us consider the task of multiplying two floating-point numbers. Figure 8 below shows the IEEE 754 single-precision floating point number format and Table 3 contains some example numbers. Multiplication of binary floating point numbers involves addition of exponents and multiplication of signed 24-bit numbers. Figure 9 shows the basic steps that must be performed to accomplish the task. By far, the most time consuming of the three steps, for 8051 hardware, is the 24-bit x 24-bit multiplication. While the DS80C390/DS80C400 arithmetic accelerator does not support direct multiplication of two 24-bit numbers, utilizing its 16-bit x 16-bit multiply and accumulate function provides a distinct performance advantage over traditional 8051 methods. An outline for multiplication of the (2) normalized 24-bit numbers using the accelerator is shown in Figure 9, Step 3. The partial product multiply-accumulate (MAC) operations have been highlighted.
Figure 8. IEEE 754 single precision floating point format.
Table 3. Normalized floating point number examples
7F7FFFFF (max positive)
3.4028234663852886e+38
66FF0C32
6.02214208470173e+23
4D8EF3C2
299792448.0
4B277224
10973732.0
47F12065
123456.7890625
461C4000
10000.0
44FA0002
2000.000244140625
448AE385
1111.1099853515625
3F800000
1.0
3F000000
0.5
203D26D0
1.6021764682116162e-19
1985873F
1.380650314593702e-23
085C305C
6.626068801043303e-34
00800000 (min positive)
1.1754943508222875e-38
80800000 (min negative)
-1.1754943508222875e-38
AF531F95
-1.9201558398851404e-10
BA81742B
-0.000987654
BF000000
-0.5
BF800000
-1.0
C1000000
-8.0
C2046666
-33.099998474121094
C7C35000
-100000.0
D0435000
-13107200000.0
D533A52B
-12345123274752.0
FF7FFFFF (max negative)
-3.4028234663852886e+38
Figure 9. Floating point multiply (A x B = C).
Application Example (continued)-Code Listing
To simplify the example, the application code does not support the following multiplicands or products: 0, -0, Infinity, -Infinity, NaN (not a number), denormalized values. Also, the code performs biased rounding of the product (i.e., if the product falls exactly between two representable numbers, it will be rounded up). This software is available for download.
;-----------------------------------------------------------
; Single Precision Floating Point Multiply
; (does not support 0, -0, INF, -INF, NaN, denormalized #'s)
;
; inputs: R0-R1-R2-R3 = multiplicand1 (float)
; R4-R5-R6-R7 = multiplicand2 (float)
;
; output: R4-R5-R6-R7 = product (float)
;
; uses: bits - psw.5, c
; SFRs - acc, b, ma, mb, mc, mcnt1
;
;-----------------------------------------------------------
;--------------------- determine sign ----------------------
; [x] = Machine Cycle Count
fmult: mov a, r0 ; [1]
xrl a, r4 ; [1]
rlc a ; [1]
mov psw.5, c ; [2] store sign in GF1
;-------------------- calc new exponent --------------------
mov a, r1 ; [1]
setb acc.7 ; [2] assumed 1 before decimal
xch a, r1 ; [1]
rlc a ; [1] get lsbit of exponent
mov a, r0 ; [1] get upper 7 of exp
rlc a ; [1] exponent byte in acc
mov r0, a ; [1] store exp to R0
mov a, r5 ; [1]
setb acc.7 ; [2] assumed 1 before decimal
xch a, r5 ; [1]
rlc a ; [1] get lsbit of exponent
mov a, r4 ; [1] get upper 7 of exp
rlc a ; [1] exponent byte in acc
add a, r0 ; [1] add exponents
add a, #81h ; [2] subtract exponent bias
mov r4, a ; [1] store to r4
;---------------- multiply significands --------------------
orl mcnt1, #10h ; [3] CLM=1 clear MA, MB, MC
mov a, r7 ; [1]
mov b, r3 ; [2]
mul ab ; [5] (A0 * B0)
mov mc, b ; [3] msb of (A0*B0) into MC
clr a ; [1] A=00
mov mb, r3 ; [2] ----------------------
mov mb, a ; [2] .
mov ma, r6 ; [2] .
mov ma, r5 ; [2] .
nop ; [1] (B2B1 * A0)
nop ; [1] .
nop ; [1] .
nop ; [1] .
nop ; [1] .
nop ; [1] ----------------------
mov mb, r2 ; [2] ----------------------
mov mb, r1 ; [2] .
mov ma, r7 ; [2] .
mov ma, a ; [2] .
nop ; [1] (A2A1 * B0)
nop ; [1] .
nop ; [1] .
nop ; [1] .
nop ; [1] .
nop ; [1] ----------------------
mov a, mc ; [2] msbyte MC not needed
push mc ; [2] save
push mc ; [2] save
mov a, mc ; [2]
add a, #40h ; [2] go ahead and round
orl mcnt1, #10h ; [3] CLM=1 clear MA, MB, MC
mov mc, a ; [2] reload 40-bit ACC
jnc mc2 ; [3] carry from add?
pop acc ; [2] yes.
inc a ; [1] add carry.
mov mc, a ; [2]
jnz mc1 ; [3] carry from add?
pop acc ; [2] yes.
inc a ; [1] add carry
mov mc, a ; [2] 40-bit ACC loaded
sjmp mc0 ; [3]
mc2: pop mc ; [2] finish loading bytes
mc1: pop mc ; [2]
mc0: mov mb, r2 ; [2] ---------------------
mov mb, r1 ; [2] .
mov ma, r6 ; [2] .
mov ma, r5 ; [2] .
nop ; [1] (A2A1 * B2B1)
nop ; [1] .
nop ; [1] .
nop ; [1] .
nop ; [1] .
nop ; [1] ---------------------
mov a, mc ; [2] msbyte MC not needed
mov r5, mc ; [2] store 3 msbytes
mov r6, mc ; [2]
mov r7, mc ; [2]
push mc ; [2] store for norm or round
mov a, r5 ; [2]
jnb acc.7, norm ; [4] need to normalize?
pop acc ; [2] no
cjne a, #0C0h, rnd ; [4]
rnd: jc no_rnd ; [3] need to round?
inc r7 ; [1] yes
mov a, r7 ; [1]
jnz no_rnd ; [3] did we carry?
inc r6 ; [1] yes
mov a, r6 ; [1]
jnz no_rnd ; [3] carry again?
inc r5 ; [1] yes
no_rnd: inc r4 ; [1] inc exponent
mov a, r4 ; [1] get exponent
mov c, psw.5 ; [2] get sign
rrc a ; [1] sign -> msbit
mov r4, a ; [1] store byte in R4
jnc exp0 ; [3] lsbit of exponent = 1?
ret ; [4] yes.
norm: pop acc ; [2]
rlc a ; [1] rotate msbit -> carry bit
mov a, r7 ; [1] -------------------------
rlc a ; [1] .
mov r7, a ; [1] .
mov a, r6 ; [1] rotate each byte by 1bit
rlc a ; [1] using the carry bit
mov r6, a ; [1] .
mov a, r5 ; [1] .
rlc a ; [1] .
mov r5, a ; [1] .
mov a, r4 ; [1] -------------------------
mov c, psw.5 ; [2] get sign
rrc a ; [1] sign -> msbit
mov r4, a ; [1] store byte in R4
jc done ; [3] lsbit of exponent = 1?
exp0: mov a, r5 ; [1] no.
cpl acc.7 ; [2] lsbit of exponent = 0.
mov r5, a ; [1]
done: ret ; [4]
; -------
; Total cycles min = [141] @40Mhz = 14.1us
; max = [168] @40Mhz = 16.8us
IEEE is a registered service mark of the Institute of Electrical and Electronics Engineers.
Keil is a trademark of Keil, An ARM Company.
Automatic Updates
Would you like to be automatically notified when new application notes are published in your areas of interest? Sign up for EE-Mail™.