Python - 将线程转换为线程池?

时间:2015-01-14 18:17:49

标签: python multithreading pool

我有一个使用线程运行良好的现有脚本,但我的事情列表越来越大,我需要限制实际使用的线程数,因为我正在杀死我的服务器。所以我想在这个脚本中添加一个Pool(100),但到目前为止我尝试的所有内容都失败了,并且出现了错误代码。任何人都可以帮助添加一个简单的池吗?我一直在四处寻找,很多游泳池都很复杂,我宁愿尽量保持这个。请注意我删除了实际的" def work(item)"因为这个脚本相当大。

import time, os, re, threading, subprocess, sys

mylist = open('list.txt', 'r')

class working (threading.Thread):
        def __init__(self, item):
                threading.Thread.__init__(self)
                self.item = item
        def run(self):
                work(self.item)

def work(item):
        <actual work that needs to be threaded>

threads = []
for l in mylist:
        work1 = l.strip()
        thread = working(work1)
        threads.append(thread)
        thread.start()
for t in threads: t.join()
mylist.close()

添加池时出错:

Process PoolWorker-10:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in _bootstrap
    self.run()
  File "/usr/lib64/python2.6/multiprocessing/process.py", line 88, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python2.6/multiprocessing/pool.py", line 71, in worker
    put((job, i, result))
  File "/usr/lib64/python2.6/multiprocessing/queues.py", line 366, in put
    return send(obj)
UnpickleableError: Cannot pickle <type 'thread.lock'> objects

刚刚清除的新CODE:

import time, os, re, threading, subprocess, sys
from multiprocessing.dummy import Pool as ThreadPool 

mylist = open('list.txt', 'r')

class working (threading.Thread):
        def __init__(self, item):
                threading.Thread.__init__(self)
                self.item = item
        def run(self):
                work(self.item)

def work(item):
        <actual work that needs to be threaded>

threads = []
for l in mylist:
        work1 = l.strip()
        pool = ThreadPool(10)
        pool.map(working, work1)
        pool.close()

1 个答案:

答案 0 :(得分:1)

多处理是一种基于流程的高级并行包。要使用进程,您需要能够在进程之间发送数据,这是错误消息告诉您的一些数据不可能(pickleable = transferable)。但是,如果您在以下位置阅读模块文档:

https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.dummy

你会发现一些名为 multiprocess.dummy 的内容。导入它,您将使用相同的接口,但使用线程而不是进程。这就是你想要的。

修改

花点时间阅读多处理模块的规范。您正在做的是向池中提交单个线程对象的创建。您想要的是提交要完成的工作要执行工作的项目。 (概念上)正确的解决方案如下所示:

def work(item):
    item = item.strip()
    <actual work that needs to be threaded>

pool = ThreadPool(10)
results = pool.map(work, mylist)
pool.close() # don't think this is strictly necessary

您未向池中提交主题,但您将工作提供给池中包含的主题。它是一个更高层次的抽象。希望这可以解决问题。