如何在EOF中忽略zlib错误?

时间:2018-01-09 22:46:34

标签: python python-2.7 gzip zlib

我需要我的Python脚本来操作gzip-ed文件,这些文件仍然可以写入。因为它们尚未被正确关闭,所以此类操作有时会导致CRC错误。

如何抑制这些错误并简单地处理所有不完整的结尾?

我的代码是:

if usegzip:
    opener = gzip.open;
else:
    opener = open;

...
for line in opener(input_filename,'r'):
    .... process line ....

遇到仍然打开的文件时遇到的异常是:

    for line in opener(input_filename,'r'):
  File "/opt/lib/python2.7/gzip.py", line 464, in readline
    c = self.read(readsize)
  File "/opt/lib/python2.7/gzip.py", line 268, in read
    self._read(readsize)
  File "/opt/lib/python2.7/gzip.py", line 315, in _read
    self._read_eof()
  File "/opt/lib/python2.7/gzip.py", line 354, in _read_eof
    hex(self.crc)))
IOError: CRC check failed 0x7248907 != 0x45e82dc4L

如果不重新实现gzip-module,我可以以某种方式抑制它吗?

1 个答案:

答案 0 :(得分:0)

好的,解决方案是放弃for - 循环的便利性并明确地迭代这些行。然后可以将显式迭代放在try / except内以处理错误。例如,这是gzip-ed文件中的简单计数器行:

import gzip
import sys

f = sys.argv[-1]
count = 0
opener = gzip.open

lines = opener(f) # Creates the iterator normally used by for-loop

while 1:
    try:
        line = lines.next()
    except (IOError, StopIteration):
        break
    count += 1

print count

正确关闭文件后,上述脚本的输出与gzcat | wc -l的输出相同。但是,当文件仍然写入时,脚本可以成功读取更多行,而不是gzcat