Question

我写了一个python脚本： 1.提交搜索查询 2.等待结果 3.解析返回的结果（XML）

我使用线程和队列模块并行执行此操作（5名工作人员）它对于查询部分非常有用，因为我可以提交多个搜索作业并在结果出来时处理结果但是，似乎我的所有线程都绑定到同一个核心。当它到达处理XML（cpu密集型）的部分时，这是显而易见的。

还有其他人遇到过这个问题吗？我在概念上遗漏了什么吗？

另外，我正在思考有两个单独的工作队列的想法，一个用于进行查询，另一个用于解析XML。就像现在一样，一个工人将连续两个都做。如果有的话，我不确定会给我带来什么。非常感谢任何帮助。

以下是代码:(删除了专有数据）

def addWork(source_list):
    for item in source_list:
        #print "adding: '%s'"%(item)
        work_queue.put(item)

def doWork(thread_id):
    while 1:
        try:
            gw = work_queue.get(block=False)
        except Queue.Empty:
            #print "thread '%d' is terminating..."%(thread_id)
            sys.exit() # no more work in the queue for this thread, die quietly

    ##Here is where i make the call to the REST API
    ##Here is were i wait for the results
    ##Here is where i parse the XML results and dump the data into a "global" dict

#MAIN
producer_thread = Thread(target=addWork, args=(sources,))
producer_thread.start() # start the thread (ie call the target/function)
producer_thread.join() # wait for thread/target function to terminate(block)

#start the consumers
for i in range(5):
    consumer_thread = Thread(target=doWork, args=(i,))
    consumer_thread.start()
    thread_list.append(consumer_thread)

for thread in thread_list:
    thread.join()

Answer 1

这是CPython处理线程的副产品。互联网上有无休止的讨论（搜索GIL），但解决方案是使用multiprocessing模块而不是threading。 Multiprocessing使用几乎相同的接口（和同步结构，因此您仍然可以使用队列）构建为线程。它只是为每个线程提供了自己的整个进程，从而避免了GIL和并行工作负载的强制序列化。

Answer 2

使用CPython，您的线程永远不会在两个不同的核心中并行运行。查找有关全局解释器锁（GIL）的信息。

基本上，有一个互斥锁可以保护解释器的实际执行部分，因此没有两个线程可以并行计算。由于阻塞，I / O任务的线程可以正常工作。

编辑：如果要充分利用多个核心，则需要使用多个进程。有很多关于这个主题的文章，我试着为你找一个我记得很棒，但找不到它= /。

正如Nathon建议的那样，您可以使用多处理模块。有一些工具可以帮助您在进程之间共享对象（请参阅POSH，Python对象共享）。

Python队列 - 仅绑定到一个核心的线程

2 个答案: