Just link the title, when I use multi thread to read the data from mongo, not fast even equal to only one process, is there something wrong i use?
我的多线程代码如下所示:
def multi_thread_flush(logger):
n_loops = 20
locks = []
for i in range(0, n_loops):
lock = thread.allocate_lock()
lock.acquire()
locks.append(lock)
try:
for i in range(0, n_loops):
thread.start_new_thread(get_node_entry_id,
(logger, 0 + 400000 * i, 400000, locks[i],))
for i in range(0, n_loops):
while locks[i].locked(): pass
logger.info("[all down] all down")
except Exception as e:
logger.error("exception: %s" % e)
def get_node_entry_id(logger, num1, num2, lock):
cursor = client.mongo_collection.find({},no_cursor_timeout=True).skip(num1).batch_size(30)
count = 0
for item in cursor:
if count > num2:
break
logger.info("%s" % item["_id"])
count = count + 1
lock.release()
我的一个流程代码如下:
def get_node_entry_id():
cursor = client.NodeEntry.find({}, no_cursor_timeout=True).batch_size(30)
for item in cursor:
print item["_id"]
client_c.close()
我尝试将batch_size从300更改为3000,但改进小。
答案 0 :(得分:0)
可能是因为mongo.skip(),因为您的批量大小为400000+。因为每次执行查询时服务器都必须从集合的开头走到指定的偏移量。请参阅this doc。
随着偏移量的增加,mongo.skip()会变慢。
它还建议使用索引来执行切片,例如:
db.col.find({_ id:{$ gt:offset}})。limit(batch_size)