Question

我有多个mongodb集合，每个集合中有100000个文档，每个文档中有10000列。有一个python脚本以多线程方式执行聚合查询。每个线程在单独的集合上调用集合。这是相应的代码：

import pymongo
import time
import threading
import sys

mongocli = pymongo.MongoClient(host="192.168.99.100", username="admin", password="admin123", \
    authSource="analyse_db")
db = mongocli['analyse_db']
collection = sys.argv[1]
column = sys.argv[2]
threaded = int(sys.argv[3])
# Start times of time column of every collection.
# There are 7 collections for each day in a week.
start_times = [1587288313, 1587374713, 1587461113, 1587547513, 1587633514, 1587719914, 1587806314]
interval = 86000
time_column = "timestamp_EP"

col = db[collection]
group_query = {"$group": {"_id": "", "result": {"$avg": "$" + column}}}

def aggregate_thread(local_col, start_time, end_time, thread_id):
    local_mongocli = pymongo.MongoClient(host="192.168.99.100", username="admin", password="admin123", \
    authSource="analyse_db")
    local_db = local_mongocli['analyse_db']
    local_col_obj = local_db[local_col]
    #cursor = local_col.aggregate([{"$match": {time_column: \
    #    {"$gte": start_time, "$lt": end_time}}}, group_query])
    thr_start_time = time.time()
    cursor = local_col_obj.aggregate([group_query])
    try:
        print(cursor.next())
    except Exception:
        print("no data")
    print("thread " + str(thread_id) + " completion time: ", time.time() - thr_start_time)

def aggregate_threaded():
    thr_list = []
    nthreads = len(start_times)
    for i in range(0, nthreads):
        ftime = start_times[i]
        ttime = ftime + interval
        local_col = collection + "_day" + str(i)
        thr = threading.Thread(target=aggregate_thread, args=[local_col, ftime, ttime, i], daemon=True)
        thr_list.append(thr)
        thr.start()
    for thr in thr_list:
        _res = thr.join()

proc_start = time.time()
aggregate_threaded()
print(time.time() - proc_start)

现在，执行脚本时，完成聚合所花费的时间与线程数成正比。即，延迟随着并发执行的查询数线性增加。这是脚本的结果（7个线程在不同的集合上执行聚合查询）：

C:\Users\AJINKYA\itanta\live_data>python perf_test3.py Archive-LiveDataLog5 tag1 1
{'_id': '', 'result': 50.054371460796965}
thread 3 completion time:  140.73034977912903
{'_id': '', 'result': 50.20921849745933}
thread 5 completion time:  146.46782159805298
{'_id': '', 'result': 50.064871338705366}
thread 4 completion time:  147.66157269477844
{'_id': '', 'result': 50.17267241078592}
thread 1 completion time:  151.08023619651794
{'_id': '', 'result': 49.85328077580493}
thread 2 completion time:  151.3344430923462
{'_id': '', 'result': 49.993023336937945}
thread 0 completion time:  151.4148395061493
{'_id': '', 'result': 49.89189660585342}
thread 6 completion time:  151.54917550086975
151.57819509506226

因此7个线程花费了151秒完成聚合。另一方面，如果只有单个线程在单个集合上执行聚合，则所需的时间要少得多。如果以上脚本中的start_times仅具有一个元素，则结果如下：

C:\Users\AJINKYA\itanta\live_data>python perf_test3.py Archive-LiveDataLog5 tag1 1
{'_id': '', 'result': 49.993023336937945}
thread 0 completion time:  43.36866021156311
43.37059950828552

如果只有一个汇总查询，则只需43秒。我的期望是，多线程查询将花费与单线程查询大致相同的时间。现在很明显，在mongodb中不会并发执行多个查询。这是mongodb的已知限制吗？ mongodb中是否有任何配置参数可以控制并发？

不同集合上的MongoDB并发查询速度很慢

0 个答案: