我已经在python中为网络抓取项目实现了多线程队列。但是,我注意到它偶尔会在处理队列中的所有项目之前结束。
我尝试实现检查以捕获这些未处理的项目,但是它不能解决确切的问题,而是可以解决该问题。我想解决这个问题的根源。
def checkCards(goodData, proxyList):
threads = 20
class ThreadUrl(threading.Thread):
"""Threaded Url Grab"""
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
try:
while True:
# grabs from queue
checkData = self.queue.get()
try:
core(checkData, proxyList) #main code to run via threads + queue
except Exception as e:
print(e)
# signals to queue job is done
self.queue.task_done()
except queue.Empty:
pass
def multiMain(passData):
# spawn a pool of threads, and pass them queue instance
for i in range(threads):
t = ThreadUrl(queue)
t.setDaemon(True)
t.start()
# populate queue with data
for y in range(len(passData)):
queue.put(passData[x])
# wait on the queue until everything has been processed
queue.join()
multiMain(goodData)
print("All done! Please see outputted file :)")
我的预期结果是让队列处理所有给出的内容,但我发现它停止得很早,并且错过了所提供列表中的10-20个网站。