Question

以下是我收集数据的网站链接：

https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection/data

基本上，我想收集火车数据集并将其直接读入我的数据科学体验笔记本，因为我的本地系统无法处理大小。我可以使用from tkinter import * from time import sleep master = Tk() frame = Frame(master) frame.grid() global is_asleep def important_function(): i=0 while(True): print(i) i = i+1 sleep(1) def pause(): #pause the execution try: is_asleep = True while(is_asleep): sleep(1) except: print('Error Occured') def resume(): is_asleep = False def stop(): #stop the execution try: print('Stop') exit() except: print('Error Occured') pause = Button(frame, text = "Pause", command = pause).grid() stop = Button(frame, text = "Stop", command = stop).grid() resume = Button(frame, text = "Resume", command = resume).grid() important_function() master.mainloop()下载zip文件，但当我尝试使用!wget时，它只会显示以下消息：

unzip

以下是我目录中的内容：

Archive:  train.csv.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of train.csv.zip or
        train.csv.zip.zip, and cannot find train.csv.zip.ZIP, period.

非常感谢任何帮助。

Answer 1

我假设你在做

!wget https://www.kaggle.com/c/8540/download/test_supplement.csv.zip

下载文件后，您将看到文件大小仅为8KB。

!ls -l test_supplement.csv.zip

下载的文件确实不是一个有效的zip文件，它是一个html文件，用于登录Kaggle。 !cat test_supplement.csv.zip将html内容。

在您通过身份验证后可以下载Kaggle数据集，因此如果没有auth，wget或curl将无法运行。

您拥有的选项，只需在您通过身份验证后从网页下载数据集，然后将其上传到您正在尝试的任何系统并使用它。（请注意kaggle关于在分发之前使用此数据集的政策）。

或

尝试使用 https://github.com/Kaggle/kaggle-api

以下是我已展示的notebook，如何安装和使用上述链接中提到的API。

谢谢，查尔斯。

如何从网站直接读取拉链到jupyter笔记本

1 个答案: