El Chabón El último refugio del tercer mundo.

3Jul/101

MBF(Microsoft Binary Format) to IEEE convertion

Publicado por mbrennan

If you want to read this article in Spanish, then click here.

Google has it all, and if he doesn't have it, probably doesn't exists.

That's what I encountered when I tried to solve this problem: Convert the MBF format to IEEE floating point standard.

When I was trying to found documentation about this issue, i found a lot of people asking the same question:

¿How I translate bits between these formats?

And the most of the cases, several failing algorithms were posted in the internet, guessing the convertion and especulating about

the format between these two types of floats.

¿Why do we need to know more about this deprecated Microsoft format? It's an easy answer: because there is so much software that relies on the use of this floating point format to work and in my case, the QWK offline messages packet format use these floats to indexing messages. (Me, working in QWK is another story).

So, here I was, trying hardly to figure out how the fuck I convert these bits, and for the first place, I tried the scheme published in a old document of this QWK format in 1992.

                   QWK Mail Packet File Layout

                              by Patrick Y. Lee
[...]
Microsoft binary (by Jeffery Foy):

   31 - 24    23     22 - 0        <-- bit position

+-----------------+----------+

| exponent | sign | mantissa |

+----------+------+----------+

IEEE (C/Pascal/etc.):

   31     30 - 23    22 - 0        <-- bit position

+----------------------------+

| sign | exponent | mantissa |

+------+----------+----------+
[...]

So, I tried differents approach to convert these bytes, and I strongly refused to use C to solve this issue, because most of the work on QWK processing is in python, and bind a C function to python wasn't my first option at all.

Desperately failing every time in this convertion, I managed to be very close to solve this scheme but, 4 bits were always wrong.

Researching even more in google, I found a concept in a snippet of code at markcoder site regarding to MBF --> IEEE convertion. A 'Magic Number'.

I was like: Ok, that's it. Fuck magic. There is no magic in computers. I refuse to believe in variables called 'blackMagic' or 'magicMask' or 'fuckingMagicWhateverItThisShit' (but, during the code of this function, i wrote several times this kind of vars during my trial and error period of coding).

So I remembered one sets of libraries called 'Stamina' for vb6 that mostly of the times, ages ago, i was using to do some dirty work in Visual Basic 6 and I found MBFIEE32.DLL with a function within: DxToIEEEs. My function. THE function.

And later, I was running this little program in VB6 in a virtualbox windows xp.

Ok. I got the correct numbers. The float rendered as an integer, just how I expected and the library was just doing what I needed.
So I started to dissasemble the segment of code that do the "black magic".

After analysing the asm code, I finally wrote a working convertion in python.
I used BitString library to do the bit handling.

import struct
from bitstring import BitString
class Utils:

    def StaminaMSBToIeeeDissasembled(self, pBytes):
        """ This function was reverse engineered from MBF2IEE.DLL
                from vb6 stamina libraries
            CPU Disasm
            -------------
            MBFIEEE32 --> Function: DxToIEEEs (Stamina Lib for vb6)
            -------------
            Address   Hex dump          Command                                  Comments
            10001070  /$  55            PUSH EBP                                 ; C Convention
            10001071  |.  8BEC          MOV EBP,ESP
            10001073  |.  56            PUSH ESI
            10001074  |.  53            PUSH EBX                                 ; End of C Convention
            10001075  |.  8B75 08       MOV ESI,DWORD PTR SS:[ARG.1]             ; We obtain the argument (my Single datatype)
            10001078  |.  66:8B5E 02    MOV BX,WORD PTR DS:[ESI+2]               ; and we get the second byte?
            1000107C  |.  66:8BCB       MOV CX,BX                                ; we copy bx to cx, we'll need later
            1000107F  |.  66:33C0       XOR AX,AX                                ; clear ax
            10001082  |.  8AC7          MOV AL,BH                                ; we copy the second byte to AL
            10001084  |.  66:83F8 03    CMP AX,3                                 ; and we check if this byte is lower than 3
            10001088  |.  72 1A         JB SHORT ax_is_below_three
            1000108A  |.  2C 02         SUB AL,2                                 ; ok, it wasn't, so whe substract 2 to that byte
            1000108C  |.  86C4          XCHG AH,AL                               ; and whe save it to AH
            1000108E  |.  02DB          ADD BL,BL                                ; we add ourselves o.o
            10001090  |.  66:D1D8       RCR AX,1                                 ; rotate 1 bit to right
            10001093  |.  66:83E1 7F    AND CX,007F                              ; aha, i've seen this number before, we mask some bytes then
            10001097  |.  66:0BC1       OR AX,CX
            1000109A  |>  66:8946 02    MOV WORD PTR DS:[ESI+2],AX               ; yes, we save the result in these bytes.
            1000109E  |.  5B            POP EBX                                  ; and closing C convention, we fucking leave.
            1000109F  |.  5E            POP ESI
            100010A0  |.  C9            LEAVE
            100010A1  |.  C2 0400       RETN 4
            100010A4  |>  33C0          XOR EAX,EAX
            100010A6  |.  8906          MOV DWORD PTR DS:[ESI],EAX
            100010A8  \.^ EB F0         JMP SHORT 1000109A

                We'll only care about two of them """

        #print "Entering MSB ---> IEEE Long Int"
        #pBytes = "\x00\xe0\x0f\x8b" # this number is equal to: 1151
        ms = BitString(bytes=pBytes, length=32)
        msle = ms[24:32] + ms[16:24] + ms[8:16] + ms[0:8]
        a = ms[24:32]
        b = ms[16:24]
        cx = ms[16:24] + ms[24:32]
       # we check that first unsigned byte is < 3         intA = int(struct.unpack('B', a.tobytes())[0])         if intA >= 3:
            # we do a lot of things.
            intA -= 2
            a = BitString(uint=int(intA), length=8) # we save the changes to that byte
            # now, we do secondByte*2
            intB  = int(struct.unpack('B', b.tobytes())[0])
            intB *= 2
            b = BitString(uint=int(intB), length=8) # we save the changes to that byte

            # now comes a tricky part.
            # in the dissasemble that i've done to the MBF2IEE.DLL (from stamina)
            # here comes a rotate that needs 2 bytes to be done.
            # so i'll create the final 2 bytes that will be stored in the final ieee convertion
            convertionBytes = a + BitString(bytes="\x00", length=8)
            convertionBytes = convertionBytes[15] + convertionBytes[0:15] # we rotate them to the right

            # now, i need to mask the previously saved byte 'cx' with 0x7f
            masked = struct.unpack('H', cx.tobytes())[0] & 0x007f
            masked = BitString(uint=masked, length=16)
            # and we OR convertionBytes with masked ! :D 

            tmpResult = (convertionBytes | masked)

            # we put things back together

            ieee = tmpResult + ms[8:16] + ms[0:8]
            i = struct.unpack('>f', ieee.tobytes())[0]

        else:
            # we do a lot of OTHER things
            i = 0   # like return zero ;D
        return int(i)

I really hope that this would be helpful to someone out there.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)