Question

我为自己制造了刮板机。在同一页面上有多个目标，我想创建一个包含所有“ URL”的列表，然后将其抓取。抓取需要一些时间，我需要同时抓取它们。因为我不想为x url“维护” x Skripts，所以我要进行多重处理并为“列表”中的每个url生成一个进程。经过一番duckduckgo并在此处https://keyboardinterrupt.org/multithreading-in-python-2-7/和此处When should we call multiprocessing.Pool.join?阅读例如，我想到了提供的代码。在cmd行中执行的代码将执行主循环，但不进入scrape（）函数（内部将显示一些未输出的打印消息）。没有给出错误消息，脚本正常退出。我想念什么？
我在Win64上使用Python 2.7。
我已经读过：
Threading pool similar to the multiprocessing Pool?
https://docs.python.org/2/library/threading.html
https://keyboardinterrupt.org/multithreading-in-python-2-7/
但是我没有帮助。

def main():
    try:
        from multiprocessing import process
        from multiprocessing.pool import ThreadPool
        from multiprocessing import pool
        thread_count = 10 # Define the limit of concurrent running threads
        thread_pool = ThreadPool(processes=thread_count) # Define the thread pool to keep track of the sub processes
        known_threads = {}
        list=[]
        list=def_list() # Just assigns the url's to the list
        for entry in range(len(list)):
            print 'starting to scrape'
            print list[entry]
            known_threads[entry] = thread_pool.apply_async(scrape, args=(list[entry]))
        thread_pool.close() # After all threads started we close the pool
        thread_pool.join() # And wait until all threads are done
        except Exception, err:
            print Exception, err, 'Failed in main loop'
        pass

multiprocessing.pool ThreadPool不执行基础功能

0 个答案: