Question

我正在使用Python 3，带有mongodb 4.0的Pymongo和Ifxpy来查询Informix数据库。我的MongoDB数据库中有4个Collections：

用户
办公室
宠物
汽车

一个用户有一个办公室，一个宠物和一辆汽车。因此，我在User集合的每个字段上都有3个引用。

我需要这样的东西：

我要查找是否有一个名为John的用户和一个名为Mickey的Pet和一个型号为Tesla且状态为{{1}的汽车}。之后，我将把用户状态更新为inactive。我必须查询Office，但在此示例中不使用它。

我为每个字段创建了索引：

active

这是我的代码：

office.create_index([("code", pymongo.DESCENDING)], unique=True)
pet.create_index([("name", pymongo.DESCENDING)], unique=True)
car.create_index([("model", pymongo.DESCENDING)], unique=True)
user.create_index([("username", pymongo.DESCENDING)], unique=True)
user.create_index([("pet", pymongo.DESCENDING)])
user.create_index([("car", pymongo.DESCENDING)])
user.create_index([("status", pymongo.DESCENDING)])

如果我从循环中删除MongoDB代码，则需要1.33秒。如果我查询MongoDB，则需要47秒。我有2万个物品。我认为这真的很慢。

我尝试通过删除所有find_one并只设置一个来查看带有office_id = None car_id = None pet_id = None ifx_connection = IfxPy.connect(ifx_param, "", "") stmt = IfxPy.exec_immediate(ifx_connection, sql) dictionary = IfxPy.fetch_assoc(stmt) # Get data key / value start = time.time() # Loop on informix data (20 000 items) while dictionary != False: # Trim all string in dict dictionary = {k: v.strip() if isinstance(v, str) else v for k,v in dictionary.items()} # Get office office_code = dictionary['office_code'] existing_office = office.find_one({"code": office_code}) if bool(existing_office): office_id = existing_office['_id'] # Get pet existing_pet = pet.find_one({"name": dictionary['pet_name']}) if bool(existing_pet): pet_id = existing_pet['_id'] # Get car existing_car = car.find_one({"model": dictionary['car_model']}) if bool(existing_car): car_id = existing_car['_id'] # Get user existing_user = user.find_one({ "username": dictionary['username'], "car": car_id, "pet": pet_id, "status" : "inactive" }) if bool(existing_user): # Change user status user.update_one({'_id': existing_user['_id']}, {"$set": {"status" : "active"}}, upsert=False) # Next row dictionary = IfxPy.fetch_assoc(stmt)的每个find_one的时间。而且，如果我只让Office find_one花费约12秒，而另一个则相同。如果我只让客户find_one，它也需要约12秒。所以〜12 * 4这就是为什么所有find_one都需要〜47秒的时间。

你能告诉我我在做什么错吗？

Answer 1

要加快该算法的速度，您需要减少MongoDB查询的数量，这可以通过利用对数据的了解来完成。所以，如果你知道您只有几个不同的办公室，或者如果您可能某个时候（或者一遍又一遍地）查询所有办公室，那么您可能希望在循环外的一个初步步骤中加载所有办公室（!!!），并使用字典将其缓存，以在循环内进行快速查找，而无需其他数据库往返。宠物和汽车也一样。

因此，更准确地说，我会：

以您现有的方式运行notifyix查询

name

model

code

_id

将返回值放入三个字典（name-> _id，model-> _id，code-> _id中
以您已经做过的方式遍历informix结果集
对于informix结果集中的每个用户，将更新模型追加到列表中，该列表的选择标准由先前收集的所有详细信息组成，并且更新部分为静态
循环外部（之后）使用bulk update更新所有用户

加快Pymongo查询

1 个答案: