从Kaggle下载时BadZipFile

时间:2017-04-02 16:12:08

标签: python

我正在尝试通过python脚本(Python 3.5)下载和解压缩Kaggle数据集,但是我收到了一个错误。

import io
from zipfile import ZipFile
import csv
import urllib.request

url = 'https://www.kaggle.com/c/quora-question-pairs/download/test.csv.zip'
response = urllib.request.urlopen(url)
c=ZipFile(io.BytesIO(response.read()))

运行此代码后,我收到以下错误。

BadZipFile:文件不是zip文件

如何摆脱这个错误?原因是什么?

1 个答案:

答案 0 :(得分:0)

使用请求模块和对http://ramhiser.com/2012/11/23/how-to-download-kaggle-data-with-python-and-requests-dot-py/的一些小修复解决方案是:

import io
from zipfile import ZipFile
import csv
import requests

# The direct link to the Kaggle data set
data_url = 'https://www.kaggle.com/c/quora-question-pairs/download/test.csv.zip'

# The local path where the data set is saved.
local_filename = "test.csv.zip"

# Kaggle Username and Password
kaggle_info = {'UserName': "my_username", 'Password': "my_password"}

# Attempts to download the CSV file. Gets rejected because we are not logged in.
r = requests.get(data_url)

# Login to Kaggle and retrieve the data.
r = requests.post(r.url, data = kaggle_info)

# Writes the data to a local file one chunk at a time.
f = open(local_filename, 'wb')
for chunk in r.iter_content(chunk_size = 512 * 1024): # Reads 512KB at a time into memory
    if chunk: # filter out keep-alive new chunks
        f.write(chunk)
f.close()

c = ZipFile(local_filename)