如何通过python列表自动执行多个mongoDB查询

时间:2018-08-23 06:36:53

标签: python mongodb pandas dataframe

这是我的查询

%%time
from pymongo import MongoClient
import datetime as dt
mongo_client = MongoClient(...credential...)
db_score = mongo_client['at-device-info']
cvsms = db_score['flat_sms']
test = cvsms.find({'customer_id': {'$in': list1}},{ 'customer_id': 1,'timestamp': 1})
df1 = pd.DataFrame(list(test))

我所做的是复制最后两行,并将list1更改为list2,将df1更改为df2。这样它将变成

test = cvsms.find({'customer_id': {'$in': list2}},{ 'customer_id': 1,'timestamp': 1})
df2 = pd.DataFrame(list(test))

然后继续对list3df3执行相同的操作。如何针对48个列表自动执行此操作,一个查询需要4分钟才能在我的jupyter笔记本上运行

1 个答案:

答案 0 :(得分:1)

您始终可以遍历所有查询,并制作一个DataFrame并将其添加到列表中,如下所示:

from pymongo import MongoClient
import datetime as dt
import pandas as pd

mongo_client = MongoClient(...
credential...)
db_score = mongo_client['at-device-info']
cvsms = db_score['flat_sms']

list1 = [1,2,3,4,5] # list of values to search
list2 = [6,7,8,9,10] # list of values to search
lists = [list1,list2]
df_list = []

for lst in lists:
    test = cvsms.find({'customer_id': {'$in': lst}}, {'customer_id': 1, 'timestamp': 1})
    df = pd.DataFrame(list(test))
    df_list.append(df)


# If you want to access each dataframe seperately from the list you can access the individual list elements
df1 = df_list[0]
df2 = df_list[1]


full_df = pd.concat(df_list)

如果您想加快速度,可以尝试将concurrent模块与ThreadPoolExecutorProcessPoolExecutor一起使用:

from concurrent import futures

def query_df(lst):
    test = cvsms.find({'customer_id': {'$in': lst}}, {'customer_id': 1, 'timestamp': 1})
    df = pd.DataFrame(list(test))
    return df

with futures.ThreadPoolExecutor(max_workers=4) as f:
    df_list = f.map(query_df,lists)

full_df = pd.concat(df_list)

最后,您可以通过串联列表从较小的数据框中创建一个较大的数据框。