Question

我试图看到pymongo的性能改进，但我没有观察到任何。

我的示例数据库有400,000条记录。基本上我看到线程和单线程性能相同 - 并且唯一的性能增益来自多个流程执行。

pymongo在查询期间不会释放GIL吗？

Single Perf：real 0m0.618s

Multiproc：real 0m0.144s

多线程：真正的0m0.656s

常规代码：

choices = ['foo','bar','baz']


def regular_read(db, sample_choice):
    rows = db.test_samples.find({'choice':sample_choice})
    return 42  # done to remove calculations from the picture

def main():
    client = MongoClient('localhost', 27017)
    db = client['test-async']
    for sample_choice in choices:
        regular_read(db, sample_choice)

if __name__ == '__main__':
    main()

$ time python3 mongotest_read.py 

real    0m0.618s
user    0m0.085s
sys 0m0.018s

现在，当我使用多处理时，我可以看到一些改进。

from random import randint, choice

import functools
from pymongo import MongoClient
from concurrent import futures

choices = ['foo','bar','baz']
MAX_WORKERS = 4

def regular_read(sample_choice):
    client = MongoClient('localhost', 27017,connect=False)
    db = client['test-async']
    rows = db.test_samples.find({'choice':sample_choice})
    #return sum(r['data'] for r in rows)
    return 42

def main():
    f = functools.partial(regular_read)
    with futures.ProcessPoolExecutor(MAX_WORKERS) as executor:
        res = executor.map(f, choices)

    print(list(res))
    return len(list(res))

if __name__ == '__main__':
    main()

$ time python3 mongotest_proc_read.py 
[42, 42, 42]

real    0m0.144s
user    0m0.106s
sys 0m0.041s

但是当你从ProcessPoolExecutor切换到ThreadPoolExecutor时，速度会回落到单线程模式。

...

def main():
    client = MongoClient('localhost', 27017,connect=False)
    f = functools.partial(regular_read, client)
    with futures.ThreadPoolExecutor(MAX_WORKERS) as executor:
        res = executor.map(f, choices)

    print(list(res))
    return len(list(res))

$ time python3 mongotest_thread_read.py 
[42, 42, 42]

real    0m0.656s 
user    0m0.111s
sys 0m0.024s

...

Answer 1

PyMongo使用标准的Python套接字模块，它在通过网络发送和接收数据时会丢弃GIL。但是，它不是MongoDB或网络是你的瓶颈：它是Python。

CPU密集型Python进程无法通过添加线程进行扩展;事实上，由于环境转换和其他低效率，它们会略微放缓。要在Python中使用多个CPU，请启动子进程。

我知道发现＆＃34;＆＃34;＆＃34;应该是CPU密集型的，但Python解释器的速度足以与我们的直觉相矛盾。如果查询速度很快并且localhost上的MongoDB没有延迟，那么MongoDB可以轻松地胜过Python客户端。您刚刚运行的实验，用子进程替换线程，确认Python性能是瓶颈。

要确保最大吞吐量，请确保启用了C扩展：pymongo.has_c() == True。有了这个，PyMongo的运行速度就像Python客户端库可以实现的那样快，以获得更多的吞吐量进行多处理。

如果您预期的实际场景涉及更耗时的查询，或者具有一些网络延迟的远程MongoDB，多线程可能会提高性能。

如何使用线程改进pymongo性能？

1 个答案: