如何使用curl下载youtube-8m数据集

时间:2017-11-02 14:58:50

标签: windows curl download windows-10 command-prompt

Youtube-8m下载网页提供以下卷曲说明:

mkdir -p ~/data/yt8m_video_level; cd ~/data/yt8m_video_level 

curl data.yt8m.org/download.py | partition=1/video_level/train mirror=us python 
curl data.yt8m.org/download.py | partition=1/video_level/validate mirror=us python 
curl data.yt8m.org/download.py | partition=1/video_level/test mirror=us python

我已经制作了目录,现在正在尝试下载培训数据。

执行时:

curl data.yt8m.org/download.py | partition=1/video_level/train mirror=us python

我收到以下错误消息:

  

'隔板'不被视为内部或外部命令,   可操作程序或批处理文件。

如果我用脱字符来逃避|像这样:

curl data.yt8m.org/download.py ^| partition=1/video_level/train mirror=us python

然后命令提示符打印http://data.yt8m.org/download.py的全部内容,后跟:

  

卷曲:(6)无法解析主持人:|
  curl:(6)无法解析主机:partition = 1
  卷曲:(6)无法解析主机:mirror = eu
  curl:(6)无法解析host:python

如何使用curl将此数据集下载到Windows 10?

1 个答案:

答案 0 :(得分:1)

That script is intended to run in a *nix (Unix or linux or ...) environment.

Do you have the bash for windows installed? If so, that is the quick solution, just run the script/cmds in that environment (and make sure that which python returns the correct /path/to/preferred/version_of/python).

To explain/expand on what that code does, *nix allows setting env vars specific to the command being run at the end of the line. An alternate way to "say" the same thing as the code you have included in *nix is

export partition=1/video_level/test
export mirror=us 
curl data.yt8m.org/download.py | python

So you want the | as a pipe, and don't want to escape it.

The equivalent in old DOS .bat file would be

set partition = 1/video_level/test
set mirror = us 
curl data.yt8m.org/download.py | python

But, older versions of dos used to have a limit of how much could be "stored" in a | (pipe). I don't know what the current limits in the Windows Cmd-Prompt are, so you may need to create your own temp files and then feed them in, i.e.

set partition = 1/video_level/test
set mirror = us 
curl data.yt8m.org/download.py > %TEMP%\mytempFile
python < %TEMP%\mytempFile

I'm not a python programmer, so I may be missing something completely obvious to pythonistas.


Just looked at the source code to download.py. Did you notice this

print ('Starting fresh download in this directory. Please make sure you '
    'have >2TB of free disk space!')

IHTH