El Chabón El último refugio del tercer mundo.

3Jul/101

MBF(Microsoft Binary Format) to IEEE convertion

Publicado por mbrennan

If you want to read this article in Spanish, then click here.

Google has it all, and if he doesn't have it, probably doesn't exists.

That's what I encountered when I tried to solve this problem: Convert the MBF format to IEEE floating point standard.

When I was trying to found documentation about this issue, i found a lot of people asking the same question:

¿How I translate bits between these formats?

And the most of the cases, several failing algorithms were posted in the internet, guessing the convertion and especulating about

the format between these two types of floats.

¿Why do we need to know more about this deprecated Microsoft format? It's an easy answer: because there is so much software that relies on the use of this floating point format to work and in my case, the QWK offline messages packet format use these floats to indexing messages. (Me, working in QWK is another story).

So, here I was, trying hardly to figure out how the fuck I convert these bits, and for the first place, I tried the scheme published in a old document of this QWK format in 1992.

                   QWK Mail Packet File Layout

                              by Patrick Y. Lee
[...]
Microsoft binary (by Jeffery Foy):

   31 - 24    23     22 - 0        <-- bit position

+-----------------+----------+

| exponent | sign | mantissa |

+----------+------+----------+

IEEE (C/Pascal/etc.):

   31     30 - 23    22 - 0        <-- bit position

+----------------------------+

| sign | exponent | mantissa |

+------+----------+----------+
[...]

So, I tried differents approach to convert these bytes, and I strongly refused to use C to solve this issue, because most of the work on QWK processing is in python, and bind a C function to python wasn't my first option at all.

Desperately failing every time in this convertion, I managed to be very close to solve this scheme but, 4 bits were always wrong.

Researching even more in google, I found a concept in a snippet of code at markcoder site regarding to MBF --> IEEE convertion. A 'Magic Number'.

I was like: Ok, that's it. Fuck magic. There is no magic in computers. I refuse to believe in variables called 'blackMagic' or 'magicMask' or 'fuckingMagicWhateverItThisShit' (but, during the code of this function, i wrote several times this kind of vars during my trial and error period of coding).

So I remembered one sets of libraries called 'Stamina' for vb6 that mostly of the times, ages ago, i was using to do some dirty work in Visual Basic 6 and I found MBFIEE32.DLL with a function within: DxToIEEEs. My function. THE function.

And later, I was running this little program in VB6 in a virtualbox windows xp.

Ok. I got the correct numbers. The float rendered as an integer, just how I expected and the library was just doing what I needed.
So I started to dissasemble the segment of code that do the "black magic".

After analysing the asm code, I finally wrote a working convertion in python.
I used BitString library to do the bit handling.

import struct
from bitstring import BitString
class Utils:

    def StaminaMSBToIeeeDissasembled(self, pBytes):
        """ This function was reverse engineered from MBF2IEE.DLL
                from vb6 stamina libraries
            CPU Disasm
            -------------
            MBFIEEE32 --> Function: DxToIEEEs (Stamina Lib for vb6)
            -------------
            Address   Hex dump          Command                                  Comments
            10001070  /$  55            PUSH EBP                                 ; C Convention
            10001071  |.  8BEC          MOV EBP,ESP
            10001073  |.  56            PUSH ESI
            10001074  |.  53            PUSH EBX                                 ; End of C Convention
            10001075  |.  8B75 08       MOV ESI,DWORD PTR SS:[ARG.1]             ; We obtain the argument (my Single datatype)
            10001078  |.  66:8B5E 02    MOV BX,WORD PTR DS:[ESI+2]               ; and we get the second byte?
            1000107C  |.  66:8BCB       MOV CX,BX                                ; we copy bx to cx, we'll need later
            1000107F  |.  66:33C0       XOR AX,AX                                ; clear ax
            10001082  |.  8AC7          MOV AL,BH                                ; we copy the second byte to AL
            10001084  |.  66:83F8 03    CMP AX,3                                 ; and we check if this byte is lower than 3
            10001088  |.  72 1A         JB SHORT ax_is_below_three
            1000108A  |.  2C 02         SUB AL,2                                 ; ok, it wasn't, so whe substract 2 to that byte
            1000108C  |.  86C4          XCHG AH,AL                               ; and whe save it to AH
            1000108E  |.  02DB          ADD BL,BL                                ; we add ourselves o.o
            10001090  |.  66:D1D8       RCR AX,1                                 ; rotate 1 bit to right
            10001093  |.  66:83E1 7F    AND CX,007F                              ; aha, i've seen this number before, we mask some bytes then
            10001097  |.  66:0BC1       OR AX,CX
            1000109A  |>  66:8946 02    MOV WORD PTR DS:[ESI+2],AX               ; yes, we save the result in these bytes.
            1000109E  |.  5B            POP EBX                                  ; and closing C convention, we fucking leave.
            1000109F  |.  5E            POP ESI
            100010A0  |.  C9            LEAVE
            100010A1  |.  C2 0400       RETN 4
            100010A4  |>  33C0          XOR EAX,EAX
            100010A6  |.  8906          MOV DWORD PTR DS:[ESI],EAX
            100010A8  \.^ EB F0         JMP SHORT 1000109A

                We'll only care about two of them """

        #print "Entering MSB ---> IEEE Long Int"
        #pBytes = "\x00\xe0\x0f\x8b" # this number is equal to: 1151
        ms = BitString(bytes=pBytes, length=32)
        msle = ms[24:32] + ms[16:24] + ms[8:16] + ms[0:8]
        a = ms[24:32]
        b = ms[16:24]
        cx = ms[16:24] + ms[24:32]
       # we check that first unsigned byte is < 3         intA = int(struct.unpack('B', a.tobytes())[0])         if intA >= 3:
            # we do a lot of things.
            intA -= 2
            a = BitString(uint=int(intA), length=8) # we save the changes to that byte
            # now, we do secondByte*2
            intB  = int(struct.unpack('B', b.tobytes())[0])
            intB *= 2
            b = BitString(uint=int(intB), length=8) # we save the changes to that byte

            # now comes a tricky part.
            # in the dissasemble that i've done to the MBF2IEE.DLL (from stamina)
            # here comes a rotate that needs 2 bytes to be done.
            # so i'll create the final 2 bytes that will be stored in the final ieee convertion
            convertionBytes = a + BitString(bytes="\x00", length=8)
            convertionBytes = convertionBytes[15] + convertionBytes[0:15] # we rotate them to the right

            # now, i need to mask the previously saved byte 'cx' with 0x7f
            masked = struct.unpack('H', cx.tobytes())[0] & 0x007f
            masked = BitString(uint=masked, length=16)
            # and we OR convertionBytes with masked ! :D 

            tmpResult = (convertionBytes | masked)

            # we put things back together

            ieee = tmpResult + ms[8:16] + ms[0:8]
            i = struct.unpack('>f', ieee.tobytes())[0]

        else:
            # we do a lot of OTHER things
            i = 0   # like return zero ;D
        return int(i)

I really hope that this would be helpful to someone out there.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)
3Jul/100

Conversion MBF(Microsoft Binary Format) a IEEE floating point

Publicado por mbrennan

Si queres leer este articulo en ingles, hace click aca.

Google lo tiene todo, y si no lo tiene, entonces no existe.
Eso fue lo que me encontre cuando quize resolver el siguiente problema: Convertir el viejo formato MBF al estandard IEEE.
Buscando documentacion sobre la conversion, me encontre con un monton de gente preguntando lo siguiente:
¿Como hago la conversión?
Y la mayor parte de los casos, habia algoritmos disponibles, sin documentar, prometiendo que funcionaban y no lo hacian, y largas
paginas especulando sobre como convertir entre estos formatos.
¿Porque necesitamos saber como funciona este viejo formato de punto flotante?
Facil: hay mucho software escrito, viejo, y a veces util, donde se utiliza. En mi caso particular, estoy trabajando sobre el viejo formato de mensajeria offline QWK, de los anaqueles de la historia pre-internet. Las queridas BBS. El porque trabajar sobre esto, es una larga historia a la cual no voy a entrar en detalles :D

Asi que aca estaba, tratando de entender como mierda convertir e stos bits, y como primer intento, use el esquema publicado en un viejo documento del formato QWK en 1992

                   QWK Mail Packet File Layout

                              by Patrick Y. Lee
[...]
Microsoft binary (by Jeffery Foy):

   31 - 24    23     22 - 0        <-- bit position

+-----------------+----------+

| exponent | sign | mantissa |

+----------+------+----------+

IEEE (C/Pascal/etc.):

   31     30 - 23    22 - 0        <-- bit position

+----------------------------+

| sign | exponent | mantissa |

+------+----------+----------+
[...]

Trate varias veces, sin exito, convertir estos bits y trate de no usar C para resolver este tema, porque gran parte de la aplicacion en la que estoy laburando esta hecha en python, y llamar funciones de C en python nunca fue mi primera opcion.
Fallando desesperadamente cada vez que intentaba convertir los floats, me di cuenta que habia 4 bits que mirase como se mirase, siempre estaban mal.

Buscando aun mas en google, encontre un concepto nuevo en la conversion en el sitio markcoder con respecto a la conversion MBF --> IEEE. "El numero magico".

Y me dije: Al carajo, no hay magia en computación. Me niego fervientemente a creer en variables llamadas 'magiaNegra', 'MascaraMagica' o 'putaMagiaEnEstaVariableCualquieraQueSeaPutoCaca'. (Aunque varias veces me sorprendi a mi mismo escribiendo variables magicas, en el duro proceso de prueba y error para corregir esos 4 bits del orto)
Entonces, la luz: Me acorde que hace muchisimos años, programando en Visual Basic 6, usaba una libreria: Stamina, para hacer mucho trabajo sucio que en vb6 era un toque mas complicado de organizar.
Encontre la lib MBFIEE32.DLL de Stamina, con una funcion adentro: DxToIEEEs. Mi funcion. LA funcion.
Minutos despues, estaba corriendo este pequeño programa en vb6 en una virtual box con windows xp.

Bien. Obtube los numeros esperados. Los Microsoft Floats me devolvian enteros, justo como esperaba.
Asi que asumi que la lib estaba haciendo exactamente lo que se le pedia. Convertir los putos numeros.
Entonces desensamble el pedazo de codigo que hacia la "magia negra".

Despues de analisar el codigo en assembler, finalmente escribi un metodo funcionando en python.
Use BitString para la manipulacion de bits.

import struct
from bitstring import BitString
class Utils:

    def StaminaMSBToIeeeDissasembled(self, pBytes):
        """ This function was reverse engineered from MBF2IEE.DLL
                from vb6 stamina libraries
            CPU Disasm
            -------------
            MBFIEEE32 --> Function: DxToIEEEs (Stamina Lib for vb6)
            -------------
            Address   Hex dump          Command                                  Comments
            10001070  /$  55            PUSH EBP                                 ; C Convention
            10001071  |.  8BEC          MOV EBP,ESP
            10001073  |.  56            PUSH ESI
            10001074  |.  53            PUSH EBX                                 ; End of C Convention
            10001075  |.  8B75 08       MOV ESI,DWORD PTR SS:[ARG.1]             ; We obtain the argument (my Single datatype)
            10001078  |.  66:8B5E 02    MOV BX,WORD PTR DS:[ESI+2]               ; and we get the second byte?
            1000107C  |.  66:8BCB       MOV CX,BX                                ; we copy bx to cx, we'll need later
            1000107F  |.  66:33C0       XOR AX,AX                                ; clear ax
            10001082  |.  8AC7          MOV AL,BH                                ; we copy the second byte to AL
            10001084  |.  66:83F8 03    CMP AX,3                                 ; and we check if this byte is lower than 3
            10001088  |.  72 1A         JB SHORT ax_is_below_three
            1000108A  |.  2C 02         SUB AL,2                                 ; ok, it wasn't, so whe substract 2 to that byte
            1000108C  |.  86C4          XCHG AH,AL                               ; and whe save it to AH
            1000108E  |.  02DB          ADD BL,BL                                ; we add ourselves o.o
            10001090  |.  66:D1D8       RCR AX,1                                 ; rotate 1 bit to right
            10001093  |.  66:83E1 7F    AND CX,007F                              ; aha, i've seen this number before, we mask some bytes then
            10001097  |.  66:0BC1       OR AX,CX
            1000109A  |>  66:8946 02    MOV WORD PTR DS:[ESI+2],AX               ; yes, we save the result in these bytes.
            1000109E  |.  5B            POP EBX                                  ; and closing C convention, we fucking leave.
            1000109F  |.  5E            POP ESI
            100010A0  |.  C9            LEAVE
            100010A1  |.  C2 0400       RETN 4
            100010A4  |>  33C0          XOR EAX,EAX
            100010A6  |.  8906          MOV DWORD PTR DS:[ESI],EAX
            100010A8  \.^ EB F0         JMP SHORT 1000109A

                We'll only care about two of them """

        #print "Entering MSB ---> IEEE Long Int"
        #pBytes = "\x00\xe0\x0f\x8b" # this number is equal to: 1151
        ms = BitString(bytes=pBytes, length=32)
        msle = ms[24:32] + ms[16:24] + ms[8:16] + ms[0:8]
        a = ms[24:32]
        b = ms[16:24]
        cx = ms[16:24] + ms[24:32]
       # we check that first unsigned byte is < 3         intA = int(struct.unpack('B', a.tobytes())[0])         if intA >= 3:
            # we do a lot of things.
            intA -= 2
            a = BitString(uint=int(intA), length=8) # we save the changes to that byte
            # now, we do secondByte*2
            intB  = int(struct.unpack('B', b.tobytes())[0])
            intB *= 2
            b = BitString(uint=int(intB), length=8) # we save the changes to that byte

            # now comes a tricky part.
            # in the dissasemble that i've done to the MBF2IEE.DLL (from stamina)
            # here comes a rotate that needs 2 bytes to be done.
            # so i'll create the final 2 bytes that will be stored in the final ieee convertion
            convertionBytes = a + BitString(bytes="\x00", length=8)
            convertionBytes = convertionBytes[15] + convertionBytes[0:15] # we rotate them to the right

            # now, i need to mask the previously saved byte 'cx' with 0x7f
            masked = struct.unpack('H', cx.tobytes())[0] & 0x007f
            masked = BitString(uint=masked, length=16)
            # and we OR convertionBytes with masked ! :D 

            tmpResult = (convertionBytes | masked)

            # we put things back together

            ieee = tmpResult + ms[8:16] + ms[0:8]
            i = struct.unpack('>f', ieee.tobytes())[0]

        else:
            # we do a lot of OTHER things
            i = 0   # like return zero ;D
        return int(i)

Realmente espero que esto le sirva a alguien.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)