我正在创建一个线程python脚本,该脚本包含一系列文件,这些文件放入队列,然后是未知数量的线程(默认为3)以开始下载。当每个线程完成时,它会使用队列状态和百分比更新stdout。正在下载所有文件,但第三个线程的状态信息错误,我不知道为什么。我一直在考虑创建一个用于计算的work_completed队列,但不认为我应该/它是否重要。有人能指出我在正确的方向吗?
download_queue = queue.Queue()
class Downloader(threading.Thread):
def __init__(self,work_queue):
super().__init__()
self.current_job = 0
self.work_queue = work_queue
self.queue_size = work_queue.qsize()
def run(self):
while self.work_queue.qsize() > 0:
url = self.work_queue.get(True)
system_call = "wget -nc -q {0} -O {1}".format(url,local_file)
os.system(system_call)
self.current_job = int(self.queue_size) - int(self.work_queue.qsize())
self.percent = (self.current_job / self.queue_size) * 100
sys.stdout.flush()
status = "\rDownloading " + url.split('/')[-1] + " [status: " + str(self.current_job) + "/" + str(self.queue_size) + ", " + str(round(self.percent,2)) + "%]"
finally:
self.work_queue.task_done()
def main:
if download_queue.qsize() > 0:
if options.active_downloads:
active_downloads = options.active_downloads
else:
active_downloads = 3
for x in range(active_downloads):
downloader = Downloader(download_queue)
downloader.start()
download_queue.join()
答案 0 :(得分:4)
您无法在一个语句中检查队列大小,然后在下一个队列中检查.get()
。与此同时,整个世界可能已经发生了变化。 .get()
方法调用是您需要调用的单个原子操作。如果它引发Empty
或阻塞,则队列为空。
您的线程可以覆盖彼此的输出。我会有另一个带有输入队列的线程,只有作业是将队列中的项打印到stdout。它还可以计算已完成项目的数量并生成状态信息。
我也倾向于不继承Thread
,而只是提供一个带有Thread
参数和target=
线程的普通.start()
实例。
根据您的回复,试试这个:
download_queue = queue.Queue()
class Downloader(threading.Thread):
def __init__(self,work_queue, original_size):
super().__init__()
self.current_job = 0
self.work_queue = work_queue
self.queue_size = original_size
def run(self):
while True:
try:
url = self.work_queue.get(False)
system_call = "wget -nc -q {0} -O {1}".format(url,local_file)
os.system(system_call)
# the following code is questionable. By the time we get here,
# many other items may have been taken off the queue.
self.current_job = int(self.queue_size) - int(self.work_queue.qsize())
self.percent = (self.current_job / self.queue_size) * 100
sys.stdout.flush()
status = ("\rDownloading " + url.split('/')[-1] +
" [status: " + str(self.current_job) +
"/" + str(self.queue_size) + ", " +
str(round(self.percent,2)) + "%]" )
except queue.Empty:
pass
finally:
self.work_queue.task_done()
def main:
if download_queue.qsize() > 0:
original_size = download_queue.qsize()
if options.active_downloads:
active_downloads = options.active_downloads
else:
active_downloads = 3
for x in range(active_downloads):
downloader = Downloader(download_queue, original_size)
downloader.start()
download_queue.join()
答案 1 :(得分:2)
如果您想使用multiprocessing
模块,它会包含一个非常好的并行imap_unordered
,这会将您的问题减少到非常优雅:
import multiprocessing, sys
class ParallelDownload:
def __init__(self, urls, processcount=3):
self.total_items = len(urls)
self.pool = multiprocessing.Pool(processcount)
for n, status in enumerate(self.pool.imap_unordered(self.download, urls)):
stats = (n, self.total_items, n/self.total_items)
sys.stdout.write(status + " [%d/%d = %0.2f %%]\n"%stats)
def download(self, url):
system_call = "wget -nc -q {0} -O {1}".format(url, local_file)
os.system(system_call)
status = "\rDownloaded " + url.split('/')[-1]
return status