Floating Point Numbers: IEEE 754 Standard | Single Precision and Double Precision Format

614K subscribers

16,466 views

About
Share

Published On Dec 5, 2023

In this video, IEEE 754 standard, and specifically IEEE single precision and double precision format for the floating point numbers is explained in detail.
And through examples, how to convert the 32-bit floating point numbers into the equivalent decimal number and vice-versa, how to convert the decimal number into a 32-bit floating point number is explained.
At the end, why the floating point numbers cover a greater range compared to fixed point numbers (with the same bits) is also explained.

Here is the list of topics covered in the video:
0:00 Introduction
2:13 Single Precision Format
7:14 Example 1
9:14 Example 2
11:31 Example 3
15:16 Why does a Floating Point Number cover a greater range than a Fixed Point Number?
19:33 Double Precision Format

For more videos on Digital Electronics, check this playlist:
   • Digital Electronics

IEEE 754 Standard:
In IEEE 754 standard, depending on how many bits are reserved for storing the floating point numbers, there are different formats.
1) Half-Precision Format (16-bits)
2) Single Precision Format (32 bits)
3) Double Precision Format (64 bits)
4) Quadruple Precision (128 bits)
5) Octuple Precision (256 bits)

Among all, Single Precision and Double Precision Formats are widely used. And in this video these two formats are explained in detail.

Single Precision Format (32 bit Floating Point Numbers):

In this format, out of the 32 bits,
1) 1 bit is reserved for sign.
2) 8 bits are reserved for exponent
3) 23 bits are reserved for mantissa / significand

To represent any binary number in this format, first binary number is normalized and then it is stored in this 32-bit format.

For Mantissa, the digit before the binary point (which is always 1) is not stored explicitly and only fractional part is stored.

The exponent is stored in biased format. For 32-bit numbers, the bias value is 127. In the actual exponent value, this bias (127) is added and then the exponent is stored. Because of this offset, all the negative numbers will get shifted towards the positive side. In this 32-bit format, the actual value of the exponent can range from -126 to +127.

Double Precision Format (64-bits Floating Point Numbers)

To achieve more precision and even more range, Double Precision Format is used.
In this format out of the 64-bits,
1) 1 bit is reserved for sign.
2) 11 bits are reserved for exponent
3) 52 bits are reserved for mantissa / significand

For 64-bit numbers, the bias value is 1023. In the actual exponent value, this bias (1023) is added and then the exponent is stored. Because of this offset, all the negative numbers will get shifted towards the positive side. In this 64-bit format, the actual value of the exponent can range from -1022 to +1023.

This video will be helpful to all the students of science and engineering in understanding the IEEE 754 standard for the Floating Point Numbers.

#ALLABOUTELECTRONICS
#digitalelectronics
#floatingpointnumbers
#IEEE754

Support the channel through membership program:
   / @allaboutelectronics
--------------------------------------------------------------------------------------------------
Follow my second channel:
   / @allaboutelectronics-quiz

Follow me on Facebook:
  / allaboutelecronics

Follow me on Instagram:
  / all_about.electronics
--------------------------------------------------------------------------------------------------
Music Credit: http://www.bensound.com

Published On Dec 5, 2023

Share/Embed

Video Link