如何解析ASM文件并获取操作码

时间:2016-05-18 13:30:08

标签: python parsing assembly

我有一个asm文件,如下所示。如何通过python3解析文件内容并获取["push", "mov", ..., "call"]等操作码?是否有任何第三个解析器或任何人可以帮助为此创建正则表达式?

.text:00401000                             ; Segment type: Pure code
.text:00401000                             ; Segment permissions:     Read/Execute
.text:00401000                             _text           segment para public 'CODE' use32
.text:00401000                                     assume cs:_text
.text:00401000                                     ;org 401000h
.text:00401000                                     assume es:nothing, ss:nothing, ds:_data, fs:nothing, gs:nothing
.text:00401000 56                                  push    esi
.text:00401001 8D 44 24 08                             lea     eax, [esp+8]
.text:00401005 50                                  push    eax
.text:00401006 8B F1                                   mov     esi, ecx
.text:00401008 E8 1C 1B 00 00                              call    ??0exception@std@@QAE@ABQBD@Z ; std::exception::exception(char const * const &)
.text:0040100D C7 06 08 BB 42 00                           mov     dword ptr [esi], offset off_42BB08
.text:00401013 8B C6                                   mov     eax, esi
.text:00401015 5E                                  pop     esi
.text:00401016 C2 04 00                                retn    4
.text:00401016                             ; ---------------------------------------------------------------------------
.text:00401019 CC CC CC CC CC CC CC                        align 10h
.text:00401020 C7 01 08 BB 42 00                           mov     dword ptr [ecx], offset off_42BB08
.text:00401026 E9 26 1C 00 00                              jmp     sub_402C51
.text:00401026                             ; ---------------------------------------------------------------------------
.text:0040102B CC CC CC CC CC                              align 10h
.text:00401030 56                                  push    esi
.text:00401031 8B F1                                   mov     esi, ecx
.text:00401033 C7 06 08 BB 42 00                           mov     dword ptr [esi], offset off_42BB08
.text:00401039 E8 13 1C 00 00                              call    sub_402C51
.text:0040103E F6 44 24 08 01                              test    byte ptr [esp+8], 1
.text:00401043 74 09                                   jz      short loc_40104E
.text:00401045 56                                  push    esi
.text:00401046 E8 6C 1E 00 00                              call    ??3@YAXPAX@Z    ; operator delete(void *)
.text:0040104B 83 C4 04                                add     esp, 4

1 个答案:

答案 0 :(得分:4)

你可以试试pyparsing:

from pyparsing import Word, hexnums, WordEnd, Optional, alphas, alphanums

hex_integer = Word(hexnums) + WordEnd() # use WordEnd to avoid parsing leading a-f of non-hex numbers as a hex
line = ".text:" + hex_integer + Optional((hex_integer*(1,))("instructions") + Word(alphas,alphanums)("opcode"))

for source_line in source:
    result = line.parseString(source_line)
    if "opcode" in result:
        print(result.opcode, result.instructions.asList())

打印:

('push', ['56'])
('lea', ['8D', '44', '24', '08'])
('push', ['50'])
('mov', ['8B', 'F1'])
('call', ['E8', '1C', '1B', '00', '00'])
('mov', ['C7', '06', '08', 'BB', '42', '00'])
('mov', ['8B', 'C6'])
('pop', ['5E'])
('retn', ['C2', '04', '00'])
('align', ['CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC'])
('mov', ['C7', '01', '08', 'BB', '42', '00'])
('jmp', ['E9', '26', '1C', '00', '00'])

你没有说你也想要这些说明,但很容易包括它们。