MongoDB-如何优化此查找/更新

时间:2019-11-29 16:39:03

标签: python mongodb asynchronous pymongo

我是MongoDB和Python的新手,必须使用pymongo编写脚本。有一个网站供用户执行搜索,后端有MongoDB,其中一个集合存储所有用户的搜索历史,另一个集合存储所有用户。

我需要遍历所有用户,获取他们过去30天的所有搜索历史记录,然后取总和,然后在其用户字段之一中设置该总和。下面是我写的。有没有一种方法可以加快速度,即改为使用聚合,还是通过多线程或使其异步?

import pymongo
from datetime import datetime, timedelta
from bson.objectid import ObjectId


def lambda_handler(event, context):
    mongohost = '10.0.0.1'
    mongoport = 27017

    mongoclient = pymongo.MongoClient(mongohost, mongoport)
    mongodb = mongoclient["maindb"]
    mongo_search_logs_collection = mongodb["searchlogs"]
    mongo_users_collection = mongodb["users"]

    days_to_subtract_from_today = 30
    search_count_start_date = (datetime.today() - timedelta(days_to_subtract_from_today)).date()

    count = 0

    # Iterate over all users and update searchCount value
    for x in mongo_users_collection.find():

        # Get total searches last X days
        total_search_count = mongo_search_logs_collection.count_documents({
            'createdBy': ObjectId(x['_id']),
            'created': {'$gte': datetime(search_count_start_date.year, search_count_start_date.month, search_count_start_date.day)}
        })

        # Update searchCount value
        mongo_users_collection.update_one({
            '_id': ObjectId(x['_id'])
        }, {
            '$set': {
                'searchCount': total_search_count
            }
        }, upsert=False)

        # Increment counter
        count += 1

    print("Processed " + str(count) + " records")

2 个答案:

答案 0 :(得分:1)

这可能是使用aggregationbulk操作的一种方式:

import pymongo
from datetime import datetime, timedelta
from bson.objectid import ObjectId


def lambda_handler(event, context):
    mongohost = '10.0.0.1'
    mongoport = 27017

    mongoclient = pymongo.MongoClient(mongohost, mongoport)
    mongodb = mongoclient["maindb"]
    mongo_search_logs_collection = mongodb["searchlogs"]
    mongo_users_collection = mongodb["users"]

    days_to_subtract_from_today = 30
    search_count_start_date = (datetime.today() - timedelta(days_to_subtract_from_today)).date()

    cursor = mongo_search_logs_collection.aggregate([
        {
            "$match":{
                "created": {"$gte": datetime(search_count_start_date.year, search_count_start_date.month, search_count_start_date.day)}
            }
        },
        {
            "$group":{
                "_id": "$createdBy", "searchCount": { "$sum": 1 }
            }
        }
    ])

    bulk = mongo_users_collection.initialize_unordered_bulk_op()
    for res in cursor:
        bulk.find({ "_id": res["_id"] }).update({ "$set": { "searchCount": res["searchCount"] } }, upsert=False)

    bulk.execute()

让我知道您是否有任何问题或疑问,因为我没有测试过;)

答案 1 :(得分:1)

在循环中多次查询 mongo_search_logs_collection 时,这会减慢处理速度。相反,您可以一次性获得用户的searchCount,然后对其进行更新。这样会更快。在stmt下面检查一次抓取中所有用户的抓取次数。

mongo_search_logs_collection.aggregate(
    [
      {
        "$match": {
          "created": {
            "$gte": datetime(search_count_start_date.year, search_count_start_date.month, search_count_start_date.day)
          }
        }
      },
      {
        "$group": {
          "_id": "$createdBy",
          "total_search_count": {
            "$sum": 1
          }
        }
      }
    ]
)