脚本下载数据文件,但我无法停止脚本

时间:2015-03-15 02:15:49

标签: python loops curl terminate

代码描述

我的下面的脚本运行正常。它基本上只是从给定的网站上找到我感兴趣的所有数据文件,检查它们是否已经在我的计算机上(如果它们已经跳过它们),最后使用cURL将它们下载到我的计算机上

问题

我遇到的问题有时会有400多个非常大的文件,我无法同时下载它们。我按下Ctrl-C但它似乎取消了cURL下载而不是脚本,所以我最终需要逐个取消所有下载。有没有解决的办法?也许以某种方式制作一个关键命令,让我在当前下载结束时停止?

#!/usr/bin/python
import os
import urllib2
import re
import timeit

filenames = []
savedir = "/Users/someguy/Documents/Research/VLF_Hissler/Data/"

#connect to a URL
website = urllib2.urlopen("http://somewebsite")

#read html code
html = website.read()

#use re.findall to get all the data files
filenames = re.findall('SP.*?\.mat', html)

#The following chunk of code checks to see if the files are already downloaded and deletes them from the download queue if they are.
count = 0
countpass = 0
for files in os.listdir(savedir):
   if files.endswith(".mat"):
      try:
         filenames.remove(files)
         count += 1
      except ValueError:
         countpass += 1

print "counted number of removes", count
print "counted number of failed removes", countpass
print "number files less removed:", len(filenames)

#saves the file names into an array of html link
links=len(filenames)*[0]

for j in range(len(filenames)):
   links[j] = 'http://somewebsite.edu/public_web_junk/southpole/2014/'+filenames[j]

for i in range(len(links)):
   os.system("curl -o "+ filenames[i] + " " + str(links[i]))

print "links downloaded:",len(links)

1 个答案:

答案 0 :(得分:0)

在下载之前,您始终可以使用curl检查文件大小:

import subprocess, sys

def get_file_size(url):
    """
    Gets the file size of a URL using curl.

    @param url: The URL to obtain information about.

    @return: The file size, as an integer, in bytes.
    """

    # Get the file size in bytes
    p = subprocess.Popen(('curl', '-sI', url), stdout=subprocess.PIPE)
    for s in p.stdout.readlines():
        if 'Content-Length' in s:
            file_size = int(s.strip().split()[-1])
    return file_size

# Your configuration parameters
url      = ... # URL that you want to download
max_size = ... # Max file size in bytes

# Now you can do a simple check to see if the file size is too big
if get_file_size(url) > max_size:
    sys.exit()

# Or you could do something more advanced
bytes = get_file_size(url)
if bytes > max_size:
    s = raw_input('File is {0} bytes. Do you wish to download? '
        '(yes, no) '.format(bytes))
    if s.lower() == 'yes':
        # Add download code here....
    else:
        sys.exit()