Question

Just link the title, when I use multi thread to read the data from mongo, not fast even equal to only one process, is there something wrong i use?

我的多线程代码如下所示：

def multi_thread_flush(logger):
    n_loops = 20
    locks = []
    for i in range(0, n_loops):
        lock = thread.allocate_lock()
        lock.acquire()
        locks.append(lock)
    try:
        for i in range(0, n_loops):
            thread.start_new_thread(get_node_entry_id,
                                (logger, 0 + 400000 * i, 400000, locks[i],))
        for i in range(0, n_loops):
            while locks[i].locked(): pass
        logger.info("[all down] all down")
    except Exception as e:
        logger.error("exception: %s" % e)

def get_node_entry_id(logger, num1, num2, lock):
    cursor = client.mongo_collection.find({},no_cursor_timeout=True).skip(num1).batch_size(30)
    count = 0
    for item in cursor:
        if count > num2:
            break
        logger.info("%s" % item["_id"])
        count = count + 1
    lock.release()

我的一个流程代码如下：

def get_node_entry_id():
    cursor = client.NodeEntry.find({}, no_cursor_timeout=True).batch_size(30)
    for item in cursor:
        print item["_id"]
    client_c.close()

我尝试将batch_size从300更改为3000，但改进小。

Answer 1

可能是因为mongo.skip（），因为您的批量大小为400000+。因为每次执行查询时服务器都必须从集合的开头走到指定的偏移量。请参阅this doc。

随着偏移量的增加，mongo.skip（）会变慢。

它还建议使用索引来执行切片，例如：

db.col.find（{_ id：{$ gt：offset}}）。limit（batch_size）

python使用多线程从mongo慢读

1 个答案: