我在S3中有一堆照片,我有相应的对象检测记录,并希望按类别下载和分类。我的脚本在几秒钟内烧掉前几百个然后慢慢爬行,没有任何内存泄漏或带宽限制(我可以检测到)或任何其他明显原因的迹象。它将在大约6-8秒内下载600张照片,然后在几分钟内下载1500张,但24小时后它只能达到1700并且仍在运行。我似乎无法想到为什么会发生这种情况,而且在没有这个问题的情况下我也做过类似的事情。我想发布我的代码的下载部分,看看是否有人能发现任何明显的问题。谢谢!
from utils import record_list
from ast import literal_eval
import os
import urllib2
import multiprocessing
photo_path = '/home/ben/Desktop/neuralnet_photos/photos/'
service = 'source_name_here/'
# Download photo and stuff into corresponding category folders
def downloader(record):
tags = literal_eval(record[-2])
for tag in tags:
try:
#Checks to see if the directory exists. If false, creates the directory.
if not os.path.isdir(photo_path + tag):
os.mkdir(photo_path + tag)
filename = record[1] + '.jpg'
url = 'https://s3.amazonaws.com/myphotosfolder/'+ service + filename
photo = urllib2.urlopen(url)
with open(photo_path + tag + "/" + filename, 'wb') as f:
f.write(photo.read())
except Exception as e:
print e
if __name__ == '__main__':
p = multiprocessing.Pool(8)
p.map(downloader, record_list)