我需要我的Python脚本来操作gzip-ed文件,这些文件仍然可以写入。因为它们尚未被正确关闭,所以此类操作有时会导致CRC错误。
如何抑制这些错误并简单地处理所有不完整的结尾?
我的代码是:
if usegzip:
opener = gzip.open;
else:
opener = open;
...
for line in opener(input_filename,'r'):
.... process line ....
遇到仍然打开的文件时遇到的异常是:
for line in opener(input_filename,'r'):
File "/opt/lib/python2.7/gzip.py", line 464, in readline
c = self.read(readsize)
File "/opt/lib/python2.7/gzip.py", line 268, in read
self._read(readsize)
File "/opt/lib/python2.7/gzip.py", line 315, in _read
self._read_eof()
File "/opt/lib/python2.7/gzip.py", line 354, in _read_eof
hex(self.crc)))
IOError: CRC check failed 0x7248907 != 0x45e82dc4L
如果不重新实现gzip-module,我可以以某种方式抑制它吗?
答案 0 :(得分:0)
好的,解决方案是放弃for
- 循环的便利性并明确地迭代这些行。然后可以将显式迭代放在try
/ except
内以处理错误。例如,这是gzip-ed文件中的简单计数器行:
import gzip
import sys
f = sys.argv[-1]
count = 0
opener = gzip.open
lines = opener(f) # Creates the iterator normally used by for-loop
while 1:
try:
line = lines.next()
except (IOError, StopIteration):
break
count += 1
print count
正确关闭文件后,上述脚本的输出与gzcat | wc -l
的输出相同。但是,当文件仍然写入时,脚本可以成功读取更多行,而不是gzcat
。