Question

我有一个方法，它使用pandas对使用read_pickle加载的800MB DataFrame进行大量只读计算。

该方法需要大约500毫秒的CPU时间才能执行。我们将此方法称为search_with_pandas。

我将此方法保存在一个模块中，我将调用＆＃34; search_with_pandas_module＆＃34;，它看起来像这样：

import pandas as pd

# load 800MB dataframe
core = pd.read_pickle(r"C:\test\core.bz2")

def search_with_pandas(inputs):
    scores = core[core.input.isin(inputs)].groupby(["item_id"]).item_id.count().sort_values(ascending=False)
    return scores

多个用户需要访问此方法，因此我开始使用多个进程并行化代码。我有一台12核机器，我正在启动10个流程，如下所示：

if __name__=='__main__':
    PORT_START = 12000
    cpu_count = 10
    for x in range(cpu_count):
        proc = multiprocessing.Process(target=worker,args=(PORT_START+x,))
        proc.start()

10个工作进程中的每一个都公开了一个简单的WSGI接口，每个接口都有自己的端口。这意味着每个进程都处于活动状态，等待请求。

def worker(port):
    from bottle import route, run, request
    from time import time

    import search_with_pandas_module

    @route('/search',method='GET')
    def search():
        time_start = time()
        response = search_with_pandas_module.search_with_pandas()
        # this returns in 500ms when run from one process at a time
        print "time took:", time()-time_start

        return response        

    run(host="127.0.0.1",port=port)

每个工作进程在启动时都会导入search_with_pandas_module，这意味着每个工作进程都拥有它自己的search_with_pandas方法使用的800MB DataFrame的副本。处理进程之间没有IPC /同步开销，因为每个工作者都拥有它自己的独立副本。

我现在在10个不同的端口上有10个进程，每个进程可通过以下方式寻址：

现在奇怪的部分。如果我一次使用一个进程，则search_with_pandas方法需要500毫秒。但是，当我同时访问这些时，search_with_pandas方法开始越来越慢，即使每个调用都由一个独立的进程处理。

没有并发：500毫秒
2个并发呼叫：684毫秒
3个并发呼叫：959 ms
4个并发呼叫：988毫秒
5个并发呼叫：1193毫秒
6个并发呼叫：1423毫秒
7个并发呼叫：1567毫秒
8个并发呼叫：1812毫秒
9个并发呼叫：2096毫秒
10个并发呼叫：2253毫秒

Here is a screenshot of my CPU graph while running these tests, please note that with more concurrency you can see more CPU utilization as each worker operates concurrently.

我无法弄清楚为什么该方法在并发运行时执行的时间要长得多。

几乎就像pandas在内部锁定一些东西，迫使其他进程等到释放。

建立/拆除每个流程都不是问题，因为每个流程只启动一次并保持活跃状态。

每个进程都拥有800MB数据帧的独立副本，那么为什么CPU会强化一个进程的只读操作会使其他进程变慢？

我正在运行Windows 7 64位，24GB RAM，SSD驱动器

在多个进程中并发运行时，pandas指数速度会慢一些

0 个答案: