线程太快停止了

时间:2015-10-18 15:12:40

标签: python multithreading python-2.7 parallel-processing

我试图创建一个执行两个并行操作的脚本:

  1. 一个主题:请求服务器并将响应附加到Pool
  2. 中的RESPONSES变量(列表)
  3. 第二个主题:处理来自RESPONSES列表的回复
  4. 它似乎有效但在所有产品处理之前就停止了。例如,已完成70种产品,但仍有30种产品。

    # -*- coding: utf-8 -*-
    
    from datetime import datetime
    from multiprocessing.pool import ThreadPool as Pool
    from threading import Thread
    import requests
    
    RESPONSES = []
    POOL_IS_ALIVE = True
    
    with open('products.txt') as f:
        LINES = f.readlines()[:100]
    
    def post_request(url):
        html = requests.get(url).content
        RESPONSES.append(html)
    
    
    def parse_product(html, url):
        # long code which returns instance of class product
    
    def start_requesting(): # Creates a pool with 100 workers
        pool = Pool(100)
    
        for n,line in enumerate(LINES):
            pool.apply_async(post_request, args=(line[:-1],))
    
        pool.close()
        pool.join()
    
    t1 = Thread(target=start_requesting)
    
    def process_responses():
        i=0
        db = db_manager.db_manager()
    
        while True:
            try:
                response = RESPONSES.pop()
            except IndexError:
                continue
    
            product = parse_product(response,'url')
            db.insert_product(product)
    
            if not t1.is_alive():
                print 'IS_ALIVE NOT'
                break
    
    
    t2 = Thread(target=process_responses)
    
    now = datetime.now()
    
    t1.start()
    t2.start()
    t2.join() # MAYBE HERE IS THE PROBLEM
    t1.join()
    
    
    print now-datetime.now()
    

    哪里可能是问题?

1 个答案:

答案 0 :(得分:4)

首先,您的代码中存在一些错误:

if not t.is_alive():
    print 'IS_ALIVE NOT'
    break

根本没有变量“t”,你有没有像“NameError: name 't' is not defined”那样的错误?

其次,只需打印程序的步骤,然后查看预期的步骤。或者您可以使用python调试器pdb

第三,

RESPONSES = []

RESPONSE是线程安全的,但就像@mguijarr所提到的那样,使用Queue会更好。