我的代码片段可以从GZ中提取文件,因为它保存为.txt文件,但有时该文件可能包含一些奇怪的文本,这会使提取模块崩溃。
我使用的方法:
def unpackgz(name ,path):
file = path + '\\' +name
outfilename = file[:-3]+".txt"
inF = gzip.open(file, 'rb')
outF = open(outfilename, 'wb')
outF.write( inF.read() )
inF.close()
outF.close()
我的问题我怎么能解决这个问题?有些东西可能类似于打开(文件,错误='忽略')作为fil:。因为使用该方法,我只能提取健康文件。
编辑至第一个问题
def read_corrupted_file(filename):
with gzip.open(filename, 'r') as f:
for line in f:
try:
string+=line
except Exception as e:
print(e)
return string
newfile = open("corrupted.txt", 'a+')
cwd = os.getcwd()
srtNameb="service"+str(46)+"b.gz"
localfilename3 = cwd +'\\'+srtNameb
newfile.write(read_corrupted_file(localfilename3))
导致多个错误: Like This
修正了工作状态:
def read_corrupted_file(filename):
string=''
newfile = open("corrupted.txt", 'a+')
try:
with gzip.open(filename, 'rb') as f:
for line in f:
try:
newfile.write(line.decode('ascii'))
except Exception as e:
print(e)
except Exception as e:
print(e)
cwd = os.getcwd()
srtNameb="service"+str(46)+"b.gz"
localfilename3 = cwd +'\\'+srtNameb
read_corrupted_file(localfilename3)
print('done')
答案 0 :(得分:0)
一般情况下,如果文件已损坏,则会尝试解压缩文件时出错,没有太多可以做的只是仍然可以获取数据,但如果你只是想阻止它崩溃,你可以使用try catch
try:
pass
except Exception as error:
print(error)
应用这个逻辑你可以用gzip一行一行地读取,尝试例外,之后,当它到达一个损坏的部分时仍然读取下一行。
import gzip
with gzip.open('input.gz','r') as f:
for line in f:
print('got line', line)