我的下面的脚本运行正常。它基本上只是从给定的网站上找到我感兴趣的所有数据文件,检查它们是否已经在我的计算机上(如果它们已经跳过它们),最后使用cURL将它们下载到我的计算机上
我遇到的问题有时会有400多个非常大的文件,我无法同时下载它们。我按下Ctrl-C但它似乎取消了cURL下载而不是脚本,所以我最终需要逐个取消所有下载。有没有解决的办法?也许以某种方式制作一个关键命令,让我在当前下载结束时停止?
#!/usr/bin/python
import os
import urllib2
import re
import timeit
filenames = []
savedir = "/Users/someguy/Documents/Research/VLF_Hissler/Data/"
#connect to a URL
website = urllib2.urlopen("http://somewebsite")
#read html code
html = website.read()
#use re.findall to get all the data files
filenames = re.findall('SP.*?\.mat', html)
#The following chunk of code checks to see if the files are already downloaded and deletes them from the download queue if they are.
count = 0
countpass = 0
for files in os.listdir(savedir):
if files.endswith(".mat"):
try:
filenames.remove(files)
count += 1
except ValueError:
countpass += 1
print "counted number of removes", count
print "counted number of failed removes", countpass
print "number files less removed:", len(filenames)
#saves the file names into an array of html link
links=len(filenames)*[0]
for j in range(len(filenames)):
links[j] = 'http://somewebsite.edu/public_web_junk/southpole/2014/'+filenames[j]
for i in range(len(links)):
os.system("curl -o "+ filenames[i] + " " + str(links[i]))
print "links downloaded:",len(links)
答案 0 :(得分:0)
在下载之前,您始终可以使用curl检查文件大小:
import subprocess, sys
def get_file_size(url):
"""
Gets the file size of a URL using curl.
@param url: The URL to obtain information about.
@return: The file size, as an integer, in bytes.
"""
# Get the file size in bytes
p = subprocess.Popen(('curl', '-sI', url), stdout=subprocess.PIPE)
for s in p.stdout.readlines():
if 'Content-Length' in s:
file_size = int(s.strip().split()[-1])
return file_size
# Your configuration parameters
url = ... # URL that you want to download
max_size = ... # Max file size in bytes
# Now you can do a simple check to see if the file size is too big
if get_file_size(url) > max_size:
sys.exit()
# Or you could do something more advanced
bytes = get_file_size(url)
if bytes > max_size:
s = raw_input('File is {0} bytes. Do you wish to download? '
'(yes, no) '.format(bytes))
if s.lower() == 'yes':
# Add download code here....
else:
sys.exit()