Question

我有一个由三部分组成的文件：

Xml标头（unicode）;
ASCII字符29（组分隔符）;
到文件末尾的数字流

我想从第一部分和数字流中获取一个xml字符串（使用struct.unpack或array.fromfile进行解析）。

我应该创建一个空字符串并添加到它中逐字节读取文件，直到找到分隔符，如here所示？

或者有没有办法阅读所有内容并使用像xmlstring = open('file.dat', 'rb').read().split(chr(29))[0]这样的东西（顺便说一下，它不起作用）？

编辑：这是我使用十六进制编辑器看到的：分隔符在那里（选择的字节）

enter image description here

Answer 1

在尝试拆分文件之前，请确保您正在读取该文件。在您的代码中，您没有.read()

with open('file.dat', 'rb') as f:
    file = f.read()
    if chr(29) in file:
        xmlstring = file.split(chr(29))[0]
    elif hex(29) in file:
        xmlstring = file.split(hex(29))[0]
    else:
        xmlstring = '\x1d not found!'

确保文件中存在ASCII 29字符（\x1d）

Answer 2

您搜索值chr(29)的尝试无效，因为该表达式29是十进制表示法中的值。但是，您从十六进制编辑器获得的值以十六进制显示，因此它是0x29（或十进制的41。

您只需在Python中进行转换 - 0xnn只是输入整数文字的另一种表示法：

>>> 0x29
41

然后，您可以使用str.partition将数据拆分为相应的部分：

with open('file.dat', 'rb') as infile:
    data = infile.read()

xml, sep, binary_data = data.partition(SEP)

<强>示范：

import random

SEP = chr(0x29)


with open('file.dat', 'wb') as outfile:
    outfile.write("<doc></doc>")
    outfile.write(SEP)
    data = ''.join(chr(random.randint(0, 255)) for i in range(1024))
    outfile.write(data)


with open('file.dat', 'rb') as infile:
    data = infile.read()

xml, sep, binary_data = data.partition(SEP)

print xml
print len(binary_data)

输出：

<doc></doc>
1024

Answer 3

mmap该文件，搜索29，从第一部分创建buffer或memoryview以提供给解析器，并通过struct传递其余部分。

使用python中的单字节分隔符将二进制文件内容拆分为两部分

3 个答案: