我一直在研究 I/O 边界,我将做一个监视器来检查网页上的新更改,每当网页上发生更改时,它应该传递给另一个线程函数进行过滤。重点是我们要避免阻塞和等待每个任务完成才继续(应该是并发的)
# System modules
import time
from queue import Queue
from threading import Thread
from loguru import logger
# Set up some global variables
num_fetch_threads = 5
queue_exploring = Queue()
queue_monitoring = Queue()
# A real app wouldn't use hard-coded data...
feed_urls = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
]
def filtering(i, q):
"""
Mocked data for filtering
"""
while True:
url = q.get()
logger.info(f'{i} Filtering: {url}')
time.sleep(i + 2)
logger.info(f"{i}: Finished filtering: {url}")
q.task_done()
def explore_links(i, q):
"""This is the worker thread function.
It processes items in the queue one after
another. These daemon threads go into an
infinite loop, and only exit when
the main thread ends.
"""
while True:
url = q.get()
logger.info(f'{i}: Downloading: {url}')
# instead of really downloading the URL,
# we just pretend and sleep
if "cnn" in url:
logger.info(f"{i}: Found new url {url}, lets sleep first!")
time.sleep(5)
queue_monitoring.put(url)
q.put(url)
time.sleep(i + 2)
q.task_done()
for i in range(num_fetch_threads):
worker = Thread(target=explore_links, args=(i, queue_exploring,))
worker.setDaemon(True)
worker.start()
worker = Thread(target=filtering, args=(i, queue_monitoring,))
worker.setDaemon(True)
worker.start()
def main():
logger.info('*** Main thread waiting')
for url in feed_urls:
queue_exploring.put(url)
queue_exploring.join()
queue_monitoring.join()
logger.info('*** Done')
if __name__ == '__main__':
main()
这是我完成的代码,正如您所看到的,我已经使用范围变量启动了 2 个不同的线程。正如您在 explore_links
中所看到的,我确实有 while True 继续检查队列并在完成时将其放回队列(这是我认为可以进行无限循环的唯一方法?)
不过我的问题是:
这样做:
for i in range(num_fetch_threads):
worker = Thread(target=explore_links, args=(i, queue_exploring,))
worker.setDaemon(True)
worker.start()
worker = Thread(target=filtering, args=(i, queue_monitoring,))
worker.setDaemon(True)
worker.start()
这是处理线程的正确方法吗?
q.put(url)
重新循环一次?