Question

目前我有一个mongo文档，如下所示：

{'_id': id, 'title': title, 'date': date}

我正在尝试的是在标题中搜索此文档，在数据库中我有5ks项目并不多，但我的文件有100万个要搜索的标题。

我已经确保标题作为集合中的索引，但是性能时间仍然非常慢（每1000个标题大约40秒，显而易见的是我正在对每个标题进行查询），这是我的代码到目前为止：

创建工作存储库：

class WorkRepository(GenericRepository, Repository):
    def __init__(self, url_root):
        super(WorkRepository, self).__init__(url_root, 'works')
        self._db[self.collection].ensure_index('title')

程序的条目（是一个REST api）：

start = time.clock()
for work in json_works: #1000 titles per request
    result = work_repository.find_works_by_title(work['title'])

    if result:
        works[work['id']] = result

end = time.clock()
print end-start

return json_encoder(request, works)

和find_works_by_title代码：

def find_works_by_title(self, work_title):
    works = list(self._db[self.collection].find({'title': work_title}))

    return works

我是mongo的新手，可能我犯了一些错误，有什么建议吗？

Answer 1

您正在为每个标题拨打一次数据库电话。往返将显着减慢进程（程序和数据库将花费大部分时间进行网络通信而不是实际工作）。

尝试以下方法（当然，使其适应您的程序结构）：

# Build a list of the 1000 titles you're searching for.
titles = [w["title"] for w in json_works]

# Make exactly one call to the DB, asking for all of the matching documents.
return collection.find({"title": {"$in": titles}})

有关$in运算符如何运作的进一步参考：http://docs.mongodb.org/manual/reference/operator/query/in/

如果之后您的查询仍然很慢，请在explain调用的返回值上使用find（此处有更多信息：http://docs.mongodb.org/manual/reference/method/cursor.explain/）并检查查询实际上是否正在使用一个索引。如果不是，请找出原因。

一次查询多个值pymongo

1 个答案: