我在这里以正确的方式进行多核编程

时间:2013-04-26 13:37:15

标签: python multithreading process multicore

我想要一些守护进程找到我需要转换成web和thumb版本的图像。我认为python在这里很有用,但我不确定我是否在这里做事。我想同时转换8张照片,要转换的图像队列可能很长。我们在服务器上有几个内核,并且在新进程中生成每个转换应该让操作系统使用可用的内核,事情会更快,对吧?这是关键点,从python再次调用imagemagick的转换脚本进行处理,并希望事情比从python主线程运行一个和一个转换要快一些。

到目前为止,我才开始测试。所以这是我的测试代码。它将创建20个任务(在1到5秒之间休眠),并将这些任务提供给总共有5个线程的池。

from multiprocessing import Process
from subprocess import call
from random import randrange
from threading import Thread
from Queue import Queue

class Worker(Thread):
    def __init__(self, tid, queue):
        Thread.__init__(self)
        self.tid = tid
        self.queue = queue
        self.daemon = True
        self.start()

    def run(self):
        while True:
            sec = self.queue.get()
            print "Thread %d sleeping for %d seconds\n\n" % (self.tid, sec)
            p = Process(target=work, args=(sec,))
            p.start()
            p.join()
            self.queue.task_done()

class WorkerPool:
    def __init__(self, num_workers):
        self.queue = Queue()
        for tid in range(num_workers):
            Worker(tid, self.queue)

    def add_task(self, sec):
        self.queue.put(sec)

    def complete_work(self):
        self.queue.join()

def work(sec):
    call(["sleep", str(sec)])

def main():
    seconds = [randrange(1, 5) for i in range(20)]
    pool = WorkerPool(5)
    for sec in seconds:
        pool.add_task(sec)
    pool.complete_work()

if __name__ == '__main__':
    main()

所以我在服务器上运行这个脚本:

johanhar@mamadev:~$ python pythonprocesstest.py

然后我检查服务器上的进程:

johanhar@mamadev:~$ ps -fux

ps的结果对我来说是错误的。对我来说,看起来好像我在python下发生了一些事情,但是在一个过程中,所以只有更多的转换(或者像在这个测试用例中一样睡觉)它会变得更慢我即使我们在服务器上有几个核心...... / p>

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
johanhar 24246  0.0  0.0  81688  1608 ?        S    13:44   0:00 sshd: johanhar@pts/28
johanhar 24247  0.0  0.0 108336  1832 pts/28   Ss   13:44   0:00  \_ -bash
johanhar 49753  0.6  0.0 530620  7512 pts/28   Sl+  15:14   0:00      \_ python pythonprocesstest.py
johanhar 49822  0.0  0.0 530620  6252 pts/28   S+   15:14   0:00          \_ python pythonprocesstest.py
johanhar 49824  0.0  0.0 100904   564 pts/28   S+   15:14   0:00          |   \_ sleep 4
johanhar 49823  0.0  0.0 530620  6256 pts/28   S+   15:14   0:00          \_ python pythonprocesstest.py
johanhar 49826  0.0  0.0 100904   564 pts/28   S+   15:14   0:00          |   \_ sleep 3
johanhar 49837  0.0  0.0 530620  6264 pts/28   S+   15:14   0:00          \_ python pythonprocesstest.py
johanhar 49838  0.0  0.0 100904   564 pts/28   S+   15:14   0:00          |   \_ sleep 3
johanhar 49846  0.0  0.0 530620  6264 pts/28   S+   15:14   0:00          \_ python pythonprocesstest.py
johanhar 49847  0.0  0.0 100904   564 pts/28   S+   15:14   0:00              \_ sleep 3

所以,如果你仍然没有得到问题或我要求的。这种方法可以称之为“多核编程”吗?

3 个答案:

答案 0 :(得分:2)

我认为你误读了ps输出。我统计了4个不同的Python实例,每个实例原则上可以分配给它自己的核心。他们是否确实获得了自己的核心是多处理的难点之一。

是的,有一个优秀的Python进程(PID 49753),它是子进程的父进程,但也有一个bash以类似的方式是父进程。

答案 1 :(得分:1)

简短& direct:是的,您正在多个核心上运行多个convert进程。

更长&稍微间接:我不会称它为多核编程",即使实际上是这样,因为这个措辞通常意味着在多个核心上运行程序的多个线程,而你却不这样做(至少在CPython中,python线程受GIL限制,实际上不能在多个内核上同时运行)。此外,您不需要并行化您的python代码,因为这不是您的瓶颈(您在convert花费时间,而不是在python代码中)

如果您只想并行化convert,那么您的python代码中甚至不需要任何线程或其他花哨的东西。

python脚本可以被序列化并遍历照片,产生新的转换过程,直到达到你喜欢的数字。然后坐下等待其中一个完成并产生一个新的;根据需要重复所有照片。

(但我同意线程比那种等待事件循环更自然,更优雅的代码)

答案 2 :(得分:1)

您可以简化代码。如果在子进程中完成工作,则不需要多个Python进程。您可以使用multiprocessing.Pool来限制并发子进程的数量:

#!/usr/bin/env python
import multiprocessing.dummy as mp # use threads
from random import randrange
from subprocess import check_call
from timeit import default_timer as timer

def info(msg, _print_lock=mp.Lock()): # a poor man's logging.info()
    with _print_lock: # avoid garbled output
        print("%s\t%s" % (mp.current_process().name, msg))

def work(sec):
    try: # wrap in try/except to avoid premature exit
        info("Sleeping for %d seconds" % (sec,))
        start = timer()
        check_call(["sleep", str(sec)])
    except Exception as e: # error
        return sec, timer() - start, e
    else: # success
        return sec, timer() - start, None

def main():
    work_items = (randrange(1, 5) for i in range(20)) # you can use generator
    pool = mp.Pool(5) # pool of worker threads
    for result in pool.imap_unordered(work, work_items):
        info("expected %s, got %s, error %s" % result)
    pool.close()
    pool.join()

if __name__ == '__main__':
    main()

输出

Thread-2    Sleeping for 3 seconds
Thread-4    Sleeping for 4 seconds
Thread-3    Sleeping for 3 seconds
Thread-5    Sleeping for 2 seconds
Thread-1    Sleeping for 1 seconds
Thread-1    Sleeping for 2 seconds
MainThread  expected 1, got 1.00222706795, error None
Thread-5    Sleeping for 2 seconds
MainThread  expected 2, got 2.00276088715, error None
Thread-2    Sleeping for 1 seconds
MainThread  expected 3, got 3.00330615044, error None
Thread-1    Sleeping for 3 seconds
MainThread  expected 2, got 2.00289702415, error None
Thread-4    Sleeping for 1 seconds
Thread-3    Sleeping for 2 seconds
MainThread  expected 4, got 4.00349998474, error None
MainThread  expected 3, got 4.00295114517, error None
Thread-2    Sleeping for 2 seconds
MainThread  expected 1, got 1.00295495987, error None
Thread-5    Sleeping for 2 seconds
MainThread  expected 2, got 2.0029540062, error None
Thread-4    Sleeping for 2 seconds
MainThread  expected 1, got 1.00314211845, error None
Thread-3    Sleeping for 4 seconds
MainThread  expected 2, got 2.00298595428, error None
Thread-2    Sleeping for 2 seconds
MainThread  expected 2, got 2.00294113159, error None
Thread-5    Sleeping for 1 seconds
MainThread  expected 2, got 2.00287604332, error None
Thread-1    Sleeping for 4 seconds
MainThread  expected 3, got 3.00323104858, error None
Thread-4    Sleeping for 3 seconds
MainThread  expected 2, got 2.00339794159, error None
Thread-5    Sleeping for 1 seconds
MainThread  expected 1, got 1.00312304497, error None
MainThread  expected 2, got 2.0027179718, error None
MainThread  expected 1, got 1.00284385681, error None
MainThread  expected 4, got 4.00334811211, error None
MainThread  expected 3, got 3.00306892395, error None
MainThread  expected 4, got 4.00330901146, error None