我正在尝试通过python脚本(Python 3.5)下载和解压缩Kaggle数据集,但是我收到了一个错误。
import io
from zipfile import ZipFile
import csv
import urllib.request
url = 'https://www.kaggle.com/c/quora-question-pairs/download/test.csv.zip'
response = urllib.request.urlopen(url)
c=ZipFile(io.BytesIO(response.read()))
运行此代码后,我收到以下错误。
BadZipFile:文件不是zip文件
如何摆脱这个错误?原因是什么?
答案 0 :(得分:0)
使用请求模块和对http://ramhiser.com/2012/11/23/how-to-download-kaggle-data-with-python-and-requests-dot-py/的一些小修复解决方案是:
import io
from zipfile import ZipFile
import csv
import requests
# The direct link to the Kaggle data set
data_url = 'https://www.kaggle.com/c/quora-question-pairs/download/test.csv.zip'
# The local path where the data set is saved.
local_filename = "test.csv.zip"
# Kaggle Username and Password
kaggle_info = {'UserName': "my_username", 'Password': "my_password"}
# Attempts to download the CSV file. Gets rejected because we are not logged in.
r = requests.get(data_url)
# Login to Kaggle and retrieve the data.
r = requests.post(r.url, data = kaggle_info)
# Writes the data to a local file one chunk at a time.
f = open(local_filename, 'wb')
for chunk in r.iter_content(chunk_size = 512 * 1024): # Reads 512KB at a time into memory
if chunk: # filter out keep-alive new chunks
f.write(chunk)
f.close()
c = ZipFile(local_filename)