Question

我试图在二进制文件中搜索一系列十六进制值，但是，我遇到了一些我无法解决的问题。（1）我不确定如何搜索整个文件并返回所有匹配项。目前我的f.seek只在我认为价值可能的情况下进行，这是不好的。（2）我希望在可能存在匹配的十进制或十六进制中返回偏移量，尽管每次都得到0，所以我不确定我做错了什么。

example.bin

AA BB CC DD EE FF AB AC AD AE AF BA BB BC BD BE
BF CA CB CC CD CE CF DA DB DC DD DE DF EA EB EC

码

# coding: utf-8
import struct
import re

with open("example.bin", "rb") as f:
    f.seek(30)
    num, = struct.unpack(">H", f.read(2))
hexaPattern = re.compile(r'(0xebec)?')
m = re.search(hexaPattern, hex(num))
if m:
   print "found a match:", m.group(1)
   print " match offset:", m.start()

也许有更好的方法来做这一切？

Answer 1

我不确定如何搜索整个文件并返回所有匹配项。

我想以十进制或十六进制
返回偏移量

import re

f = open('data.txt', 'wb')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.close()

f = open('data.txt', 'rb')
data = f.read()
f.close()

pattern = "\xEB\xEC"
regex = re.compile(pattern)

for match_obj in regex.finditer(data):
    offset = match_obj.start()
    print "decimal: {}".format(offset)
    print "hex(): " + hex(offset)
    print 'formatted hex: {:02X} \n'.format(offset)

--output:--
decimal: 2
hex(): 0x2
formatted hex: 02 

decimal: 6
hex(): 0x6
formatted hex: 06 

decimal: 10
hex(): 0xa
formatted hex: 0A 

decimal: 14
hex(): 0xe
formatted hex: 0E 

decimal: 18
hex(): 0x12
formatted hex: 12 

decimal: 22
hex(): 0x16
formatted hex: 16 

decimal: 26
hex(): 0x1a
formatted hex: 1A

文件中的位置使用基于0的索引作为列表。

e.finditer（pattern，string，flags = 0）
  返回一个迭代器，在所有上面生成MatchObject实例   字符串中RE模式的非重叠匹配。字符串是   从左到右扫描，并按找到的顺序返回匹配。

匹配对象支持以下方法和属性：
  开始（[组]）
  端（[组]）
  返回开始和结束的索引   子串与组匹配;组默认为零（表示   整个匹配的子串）。

https://docs.python.org/2/library/re.html

Answer 2

试

import re

with open("example.bin", "rb") as f:
    f1 = re.search(b'\xEB\xEC', f.read())

print "found a match:", f1 .group()
print " match offset:", f1 .start()

Python正则表达式搜索十六进制字节

2 个答案: