Last week we saw how data is stored in the memory. If you've missed it, read it here. The data we dealt with was mostly of integral type: mainly integers, char, short and long. Today we'll see how floating point data is handled at the low level memory. This is done by considering the float datatype.
The function DecimalToBinary() simply converts an integer to binary form and stores in a string.
We know an integer 5 in a 32 bit integer is stored in the memory as :
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
Consider a floating point number, say, 5.5. We'd have to develop a mechanism to store the fractional part as well as the integral part. As we know in binary representation,
5.5 = (101.1)2
Thus we need to set our memory block in two parts(as far we understand right now) - one block for the integral part and one for the fractional. Hence, the following could be done :
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
< --------------- Integral Part ------------------------------------ >|<- Fractional Part- >
Thus all we have to do is convert the number into its decimal representation and fill the integral and fractional parts in their respective blocks.
More examples :
7.5
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
< ------------------------------- Integral Part --------------------- >|< -Fractional Part - >
3.5
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
< --------------- Integral Part ----------------------------------- >|< -- Fractional Part-- >
However, it is necessary to note that just one byte cannot represent the fractional part accurately enough. Hence the IEEE came up with the following format.
1 or 0 |
|< ----------- exp --------- >|< ------------------- 1.xxxxxx------------------------ >;
In this format we have to take the binary equivalent then write in the standard scientific form using base 2. An example would make this clear.
Ex: 15.510 = (1111.1)2
1111.1 should written as 1.1111 x 23
Once this is done, we ought to remember that
exp = (power of 2) + 127.
In this case, exp = 130
And the 1.1111 we got is equivalent to 1.xxxxx...
Thus xxxxxx... = 111100....(filing the rest with zeros)
0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|< ----------- exp --------- >|< ---------------------- 1.xxxxxx--------------------------- >
Here is a C code to demonstrate this :
#include <stdio.h> void DecimalToBinary(char *b,long int d,int nbits){int testbit = 1,i=0,ctr = 0;testbit = testbit << nbits-1;
while(i<(nbits+nbits/8)){if(d & testbit){b[i++] = '1';}else{b[i++] = '0';}ctr++;if(ctr == 8){ctr = 0;b[i++] = ' ';}d = d << 1;}b[i] = '\0';}int main(){char bin[100];long int *lptr;float num = 15.5;float *fptr = & num;lptr = (long*) fptr;DecimalToBinary(bin,*lptr,32);printf("Float point number : %f\n",num);printf("IEEE format : %s",bin);return 0;}
The function DecimalToBinary() simply converts an integer to binary form and stores in a string.
Additionally it displays in blocks of 1 byte.
The lines that are relevant to our discussion are in the main().
long int *lptr;float num = 15.5;float *fptr = & num;
lptr = (long*) fptr;
What we do here is typecast the address of float variable 'num' to long*. Here, only the pointer is cased and not the data itself. Thus the format of float is preserved. By typecasting we instruct the compiler to treat the same memory location like a long int (long because 4 bytes). We send this number to our DecimalToBin().
Lastly, we need to make things clear about what happens when an int variable is assigned to a float.
int i= 5;float f=i;printf(“%f”,f);
Here we better not confuse ourselves with all the formatting we learnt. Ouput would still be 5. What happens at line 2. is '5' is formatted accordingly and stored in the memory.
Consider another piece of code.
Int I = 37;float f =*(float *)&i;
In these lines, again, we don't alter the data – we just typecast the address and assign to f. Now whenever 'f' is used, compiler treats it like a float. This also implies that whatever data be there, it will be treated according to the float format.
Int format.
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 |
Float format.
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 |
| < ---------- exp --------- >|< -------------------- 1.xxxxxx------------------------------ >
Thus when evaluated in this form, value of 'f' is very small, often shown as 0.
Categories:
c,
class,
Floating point numbers,
IEEE,
linux,
linux class,
system programming,
type casting