启动约300个线程需要大约15分钟(打印出来:'使用代理启动');但是,每当我删除线程运行函数中while循环中的所有代码时,它会在10秒内启动所有线程(可能更少)。有关while循环中的内容是否使所有线程启动/启动非常缓慢的任何想法?
#!/usr/bin/env python
import requests
import sys
import string
import os.path
import urllib.request
import threading
import mimetypes
from time import gmtime, strftime, sleep
from random import choice
#list of our proxies
proxies = []
working = []
downloads = 1
initiated = 0
#the number of files we want to download
target = int(sys.argv[1])
#argument 2 - proxies
try:
sys.argv[2]
except:
pass
else:
param = sys.argv[2]
if param.find('.txt') != -1:
print('Loading specified proxy list ('+ param +').')
f = open(param, 'r+')
print('Opening '+ f.name)
proxylist = f.read()
f.close()
#split retrieved list by new line
proxies = proxylist.split('\n')
else:
print('Single proxy specified.')
proxies.append(param)
class thread(threading.Thread):
def __init__(self, ID, name, proxy):
threading.Thread.__init__(self)
self.id = ID
self.name = name
self.downloads = 0
self.proxy = proxy
self.running = True
self.fails = 0
def run(self):
global downloads
global working
global initiated
initiated += 1
if self.proxy != False:
#id is always above one, so make the ID -1
self.proxy = proxies[(self.id-1)]
print(self.name +' initiating with proxy: '+self.proxy)
else:
print(self.name +' initiating without a proxy.')
#start actual downloads
while downloads <= target and self.running:
#wait for all threads to be loaded before starting requests
if (initiated-1) == len(proxies):
rstr = ''.join(choice(string.ascii_letters + string.digits) for x in range(5))
url = 'http://puu.sh/'+rstr
filename = 'downloaded/'+ strftime('%Y %m %d %H-%M-%S ['+ rstr +']', gmtime()) +'.png'
try:
if self.proxy != False:
#make our requests go through proxy
r = requests.get(url, None, {'http' : self.proxy})
else:
r = requests.get(url)
except IOError:
if self.fails >= 10:
#print(self.name +': Proxy is not working. Stopping thread.')
self.running = False
self.fails += 1
pass
except:
pass
else:
if r.status_code == 200 and r.headers['Content-Type'] != 'text/html':
with open(filename, 'wb') as f:
for chunk in r.iter_content(1024):
f.write(chunk)
print(self.name +': '+ filename+' downloaded...' + str(downloads))
downloads += 1
self.downloads += 1
if not self.proxy in working and self.proxy != False:
working.append(self.proxy)
sleep(5)
#lets create the "downloaded" folder if it does not exist
if not os.path.isdir('downloaded'):
try:
os.mkdir('downloaded')
except:
pass
#thread count
thread_count = 1
#create threads, and initiate them
try:
thread(0, 'Thread-main', False).start()
for x in proxies:
thread(thread_count, 'Thread-'+str(thread_count), proxies[(thread_count-1)]).start()
thread_count += 1
except:
print('Couldn\'t start threads.')
答案 0 :(得分:2)
首先,我完全同意Matteo Italia的评论。我还认为您对变量initiated
的使用存在问题:
您的代码如下:
while downloads <= target and self.running:
#wait for all threads to be loaded before starting requests
if (initiated-1) == len(proxies):
...
# nothing in else
因此所有线程都在积极等待!这意味着他们只是在等待其他线程使用大量的CPU ......你应该阻止一个事件:
initiated = 0
all_ready = Event()
和run()
:
if initiated-1 == len(proxies):
all_ready.set()
else:
all_ready.wait()
while downloads <= target and self.running:
...
答案 1 :(得分:0)
这种情况正在发生,因为你按顺序运行每个线程 - 你创建每个线程,然后你启动它,然后它启动,它打印出消息,然后它去做所有的web东西,然后它睡觉,然后在如果没有安排另一个已经创建的线程,那么可以启动下一个线程的休眠。
当然,正如前面提到的,python在线程方面非常糟糕。尽量避免它。