Question

我有一个看起来像这样的输入文件

some data...
some data...
some data...
...
some data...
<binary size="2358" width="32" height="24">
data of size 2358 bytes
</binary>
some data...
some data...

二进制大小的值2358可以针对不同的文件进行更改。现在我想提取这个文件的2358字节数据（这是一个变量）并写入另一个文件。

我为此编写了以下代码。但它给了我一个错误。问题是，我无法提取这个2358字节的二进制数据并写入另一个文件。

c = responseFile.read(1)
ValueError: Mixing iteration and read methods would lose data

代码是 -

import re

outputFile = open('output', 'w')    
inputFile = open('input.txt', 'r')
fileSize=0
width=0
height=0

for line in inputFile:
    if "<binary size" in line:
        x = re.findall('\w+', line)
        fileSize = int(x[2])
        width = int(x[4])
        height = int(x[6])
        break

print x
# Here the file will point to the start location of 2358 bytes.
for i in range(0,fileSize,1):
    c = inputFile.read(1)
    outputFile.write(c)


outputFile.close()
inputFile.close()

我的问题的最终答案 -

#!/usr/local/bin/python

import os
inputFile = open('input', 'r')
outputFile = open('output', 'w')

flag = False

for line in inputFile:
    if line.startswith("<binary size"):
        print 'Start of Data'
        flag = True
    elif line.startswith("</binary>"):
        flag = False
        print 'End of Data'
    elif flag:
        outputFile.write(line) # remove newline

inputFile.close()
outputFile.close()

# I have to delete the last extra new line character from the output.
size = os.path.getsize('output')
outputFile = open('output', 'ab')
outputFile.truncate(size-1)
outputFile.close()

Answer 1

不同的方法怎么样？在伪代码中：

for each line in input file:
    if line starts with binary tag: set output flag to True
    if line starts with binary-termination tag: set output flag to False
    if output flag is True: copy line to the output file

在实际代码中：

outputFile = open('./output', 'w')    
inputFile = open('./input.txt', 'r')

flag = False

for line in inputFile:

    if line.startswith("<binary size"):
        flag = True
    elif line.startswith("</binary>"):
        flag = False
    elif flag:
        outputFile.write(line[:-1]) # remove newline


outputFile.close()
inputFile.close()

Answer 2

尝试将您的第一个循环更改为以下内容：

while True:
    line = inputFile.readline()
    # continue the loop as it was

这摆脱了迭代，只留下了读取方法，所以问题就会消失。

Answer 3

考虑这种方法：

import re

line = '<binary size="2358" width="32" height="24">'

m = re.search('size="(\d*)"', line)

print m.group(1)  # 2358

它与您的代码不同，因此它不是替代品，但正则表达式功能不同。

这使用Python的正则表达式组捕获功能，并且比字符串拆分方法要好得多。

例如，考虑如果重新排序属性会发生什么。例如：

<binary width="32" size="2358" height="24">'
instead of
<binary size="2358" width="32" height="24">'

您的代码是否仍然可以使用？我的意志。： - ）

修改：回答您的问题：

如果您想从文件开头读取 n 个字节的数据，您可以执行类似

的操作

bytes = ifile.read(n)

请注意，如果输入文件不够长，则可能会少于 n 字节。

如果您不想从“第0个”字节开始，而是从其他字节开始，请先使用seek()，如：

ifile.seek(9)
bytes = ifile.read(5)

这将为您提供字节9:13或第10到第14字节。

读取文件行和字符

3 个答案: