Question

我正在尝试读取数据库的二进制文件并使用python解析它。从来没有用python做过这样的事情，而且我对“凌乱”的数据有些麻烦。数据中有一堆NULL值，我不知道如何在不检查NULL终止符的情况下逐字节读取文件。

如何阅读包含所有这些杂乱值的文件？

我正在使用此方法从打开的文件缓冲区中获取可变数量的字节（我不知道这是否是正确的名称，但在调用此函数之前我本来已经调用了file = open(file_path, "rb")在档案上。

    def getBytes(self, file, numBytes):

      bArray = file.read(numBytes)
      x=0
      while x < numBytes:

        if (bArray[x] < 32) or (bArray[x] > 126):
          bArray[x] = 32
        x+=1

      charArray = bArray.decode("utf-8")

      self.buffer += numBytes

      return charArray

即使只测试一串没有特殊字符的uft-8字符，我也会收到此错误。所以这绝对不是一个很好的实现。

Traceback (most recent call last): File "D:\projects\git\pgdump_parser\src\python\PG_Dump_Parser\Source_Code\main.py", line 3, in <module> Sp = Parser.Parser("./PG_Dump_Parser/Data/small_data.txt") File "D:\projects\git\pgdump_parser\src\python\PG_Dump_Parser\Source_Code\Parser.py", line 17, in __init__ self.inData = self.getEntities() File "D:\projects\git\pgdump_parser\src\python\PG_Dump_Parser\Source_Code\Parser.py", line 66, in getEntities found = self.findNextCREATE(file) File "D:\projects\git\pgdump_parser\src\python\PG_Dump_Parser\Source_Code\Parser.py", line 34, in findNextCREATE byte = self.getBytes(file, 1) File "D:\projects\git\pgdump_parser\src\python\PG_Dump_Parser\Source_Code\Parser.py", line 97, in getBytes print("bArrayOld: %s \nx: %s" % (bArray[x], x)) IndexError: bytearray index out of range

Answer 1

如果您想用空格替换某些字符，则使用translate方法会更容易。

（请注意，self.buffer应使用实际读取的字节数进行更新，而不是尝试读取的字节数。）

not_printable_ascii = bytes(range(32)) + bytes(range(127, 256))
spaces = b' ' * len(non_printable_ascii)
trans_table = bytes.maketrans(not_printable_ascii, spaces)

def getBytes(self, file, numBytes):
    bArray = file.read(numBytes)
    self.buffer += len(bArray)
    return bArray.translate(trans_table).decode("utf-8")

如何使用Python读取具有NULL字符的二进制文件

1 个答案: