我正在尝试下载并解压缩gzip文件,然后将生成的tsv格式的解压缩文件转换为CSV格式,这将更容易解析。我正在尝试从"Download Table" link中的this URL收集数据。我的代码如下,我使用与in this post相同的想法,但我在行IOError: Not a gzipped file
中收到错误outfile.write(decompressedFile.read())
。我的代码如下:
import os
import urllib2
import gzip
import StringIO
baseURL = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?"
filename = "D:\Sidney\irt_euryld_d.tsv.gz" #Edited after heinst's comment below
outFilePath = filename[:-3]
response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())
compressedFile.seek(0)
decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')
with open(outFilePath, 'w') as outfile:
outfile.write(decompressedFile.read())
#Now have to deal with tsv file
import csv
with open(outFilePath,'rb') as tsvin, open('ECB.csv', 'wb') as csvout:
tsvin = csv.reader(tsvin, delimiter='\t')
csvout = csv.writer(csvout) #Converting output into CSV Format
答案 0 :(得分:3)
基本上你试图拉错文件 在检查代码中的响应时,您会得到一个错误的html页面 您正在尝试将自己的路径添加到导致错误网址的网址
import os
import urllib2
import gzip
import StringIO
baseURL = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file="
filename = "data/irt_euryld_d.tsv.gz" #Edited after heinst's comment below
outFilePath = filename.split('/')[1][:-3]
response = urllib2.urlopen(baseURL + filename)
print response
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())
compressedFile.seek(0)
decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')
with open(outFilePath, 'w') as outfile:
outfile.write(decompressedFile.read())
#Now have to deal with tsv file
import csv
with open(outFilePath,'rb') as tsvin, open('ECB.csv', 'wb') as csvout:
tsvin = csv.reader(tsvin, delimiter='\t')
csvout = csv.writer(csvout) #Converting output into CSV Format
差异是文件名的行和baseURL的一小部分 filename =" data / irt_euryld_d.tsv.gz" 根据您指定的链接
,这是正确的文件名另一个变化是 这行outFilePath = filename.split(' /')[1] [: - 3]
可以更好地写成
outFilePath = os.join('D:','Sidney',filename.split('/')[1][:-3])