Question

strings是一个GNU / Linux应用程序，可以在文件中打印可打印字符的字符串。

有什么方法可以用strings来做但在Python中做的事情？

在我的情况下，调用strings并抓取输出不是一个选项。

Answer 1

逐字节检查以查看它是否介于0x20和0x7F之间。如果字节是可读的ASCII字符，则应该打印出来。

Answer 2

如果您不关心输出的内容，如果您简单地忽略所有解码错误，则很容易实现：

在python2中：

with open('file') as fd:
    print fd.read().decode('ascii', errors='ignore')

在python3中：

import codecs
with open('file') as fd:
    print(codecs.decode(fd.read(), 'ascii', errors='ignore'))

以任何方式，errors='ignore'只是忽略解码过程中的所有错误。

进一步参考：https://docs.python.org/2/library/codecs.html

python3：https://docs.python.org/3.5/library/codecs.html

Answer 3

以下内容将打印长度为4或更长的所有单词的列表：

import re

with open(r"my_binary_file", "rb") as f_binary:
    print re.findall("([a-zA-Z]{4,})", f_binary.read())

通过这样做，它减少了一些非文本匹配，但当然可能会错过你正在寻找的东西。 strings的默认值也为4.

Answer 4

以下内容应该在bytes数组中找到长度为4或更多的所有字符串（默认情况下为strings）：

def strings(data):
    cleansed = "".join(map(lambda byte: byte if byte >= chr(0x20) and byte <= chr(0x7F) else chr(0), data))
    return filter(lambda string: len(string) >= 4, cleansed.split(chr(0)))

Python：检测二进制文件中的所有字符串？

4 个答案: