我有一个asm文件,如下所示。如何通过python3解析文件内容并获取["push", "mov", ..., "call"]
等操作码?是否有任何第三个解析器或任何人可以帮助为此创建正则表达式?
.text:00401000 ; Segment type: Pure code
.text:00401000 ; Segment permissions: Read/Execute
.text:00401000 _text segment para public 'CODE' use32
.text:00401000 assume cs:_text
.text:00401000 ;org 401000h
.text:00401000 assume es:nothing, ss:nothing, ds:_data, fs:nothing, gs:nothing
.text:00401000 56 push esi
.text:00401001 8D 44 24 08 lea eax, [esp+8]
.text:00401005 50 push eax
.text:00401006 8B F1 mov esi, ecx
.text:00401008 E8 1C 1B 00 00 call ??0exception@std@@QAE@ABQBD@Z ; std::exception::exception(char const * const &)
.text:0040100D C7 06 08 BB 42 00 mov dword ptr [esi], offset off_42BB08
.text:00401013 8B C6 mov eax, esi
.text:00401015 5E pop esi
.text:00401016 C2 04 00 retn 4
.text:00401016 ; ---------------------------------------------------------------------------
.text:00401019 CC CC CC CC CC CC CC align 10h
.text:00401020 C7 01 08 BB 42 00 mov dword ptr [ecx], offset off_42BB08
.text:00401026 E9 26 1C 00 00 jmp sub_402C51
.text:00401026 ; ---------------------------------------------------------------------------
.text:0040102B CC CC CC CC CC align 10h
.text:00401030 56 push esi
.text:00401031 8B F1 mov esi, ecx
.text:00401033 C7 06 08 BB 42 00 mov dword ptr [esi], offset off_42BB08
.text:00401039 E8 13 1C 00 00 call sub_402C51
.text:0040103E F6 44 24 08 01 test byte ptr [esp+8], 1
.text:00401043 74 09 jz short loc_40104E
.text:00401045 56 push esi
.text:00401046 E8 6C 1E 00 00 call ??3@YAXPAX@Z ; operator delete(void *)
.text:0040104B 83 C4 04 add esp, 4
答案 0 :(得分:4)
你可以试试pyparsing:
from pyparsing import Word, hexnums, WordEnd, Optional, alphas, alphanums
hex_integer = Word(hexnums) + WordEnd() # use WordEnd to avoid parsing leading a-f of non-hex numbers as a hex
line = ".text:" + hex_integer + Optional((hex_integer*(1,))("instructions") + Word(alphas,alphanums)("opcode"))
for source_line in source:
result = line.parseString(source_line)
if "opcode" in result:
print(result.opcode, result.instructions.asList())
打印:
('push', ['56'])
('lea', ['8D', '44', '24', '08'])
('push', ['50'])
('mov', ['8B', 'F1'])
('call', ['E8', '1C', '1B', '00', '00'])
('mov', ['C7', '06', '08', 'BB', '42', '00'])
('mov', ['8B', 'C6'])
('pop', ['5E'])
('retn', ['C2', '04', '00'])
('align', ['CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC'])
('mov', ['C7', '01', '08', 'BB', '42', '00'])
('jmp', ['E9', '26', '1C', '00', '00'])
你没有说你也想要这些说明,但很容易包括它们。