| Japanese | English |
14th LSI Design Contests・in Okinawa Design Specification - 5
5. FIXED POINT FORMAT
INTRODUCTION
In order to get lossless compression using DCT, we need to process DCT using floating point number. Using floating point number in hardware is more complicated than integer number. In order to reduce this problem without losing too much precision, we can use fixed point number for hardware implementation. In this representation, we must determine how many bits will be used for representing decimal number and how many bits will be used for fractional number. Floating point number is approximated using a pair of integers , where n is mantissa and q is exponent. Exponent q represents how many bits for representing fractional number in binary.
Mantissa (n) | Exponent (q) | Binary | Decimal |
---|---|---|---|
01100100 | -1 | 011001000 | 200 |
01100100 | 0 | 01100100 | 100 |
01100100 | 1 | 0110010.0 | 50 |
01100100 | 2 | 011001.00 | 25 |
01100100 | 3 | 01100.100 | 12.5 |
01100100 | 7 | 0.1100100 | 0.78125 |
There is trade-off between how many bits we use for binary number and design area/power. Thus, choosing appropriate bits for binary number is important. If we use two’s complement we need a sign bit on MSB position.
Sign bit | Integer | Fraction |
---|