解压缩gzip文件时出现IOError

时间:2015-06-16 15:20:04

标签: python

我正在尝试下载并解压缩gzip文件,然后将生成的tsv格式的解压缩文件转换为CSV格式,这将更容易解析。我正在尝试从"Download Table" link中的this URL收集数据。我的代码如下,我使用与in this post相同的想法,但我在行IOError: Not a gzipped file中收到错误outfile.write(decompressedFile.read())。我的代码如下:

import os
import urllib2 
import gzip
import StringIO

baseURL = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?"
filename = "D:\Sidney\irt_euryld_d.tsv.gz" #Edited after heinst's comment below
outFilePath = filename[:-3]

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())

compressedFile.seek(0)

decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb') 

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

#Now have to deal with tsv file
import csv

with open(outFilePath,'rb') as tsvin, open('ECB.csv', 'wb') as csvout:
    tsvin = csv.reader(tsvin, delimiter='\t')
    csvout = csv.writer(csvout) #Converting output into CSV Format

1 个答案:

答案 0 :(得分:3)

基本上你试图拉错文件 在检查代码中的响应时,您会得到一个错误的html页面 您正在尝试将自己的路径添加到导致错误网址的网址

import os
import urllib2 
import gzip
import StringIO

baseURL = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file="
filename = "data/irt_euryld_d.tsv.gz" #Edited after heinst's comment below
outFilePath = filename.split('/')[1][:-3]
response = urllib2.urlopen(baseURL + filename)
print response
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())

compressedFile.seek(0)

decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb') 

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

#Now have to deal with tsv file
import csv

with open(outFilePath,'rb') as tsvin, open('ECB.csv', 'wb') as csvout:
    tsvin = csv.reader(tsvin, delimiter='\t')
    csvout = csv.writer(csvout) #Converting output into CSV Format

差异是文件名的行和baseURL的一小部分 filename =" data / irt_euryld_d.tsv.gz" 根据您指定的链接

,这是正确的文件名

另一个变化是 这行outFilePath = filename.split(' /')[1] [: - 3]

可以更好地写成

outFilePath = os.join('D:','Sidney',filename.split('/')[1][:-3])