这是我的查询
%%time
from pymongo import MongoClient
import datetime as dt
mongo_client = MongoClient(...credential...)
db_score = mongo_client['at-device-info']
cvsms = db_score['flat_sms']
test = cvsms.find({'customer_id': {'$in': list1}},{ 'customer_id': 1,'timestamp': 1})
df1 = pd.DataFrame(list(test))
我所做的是复制最后两行,并将list1
更改为list2
,将df1更改为df2
。这样它将变成
test = cvsms.find({'customer_id': {'$in': list2}},{ 'customer_id': 1,'timestamp': 1})
df2 = pd.DataFrame(list(test))
然后继续对list3
和df3
执行相同的操作。如何针对48个列表自动执行此操作,一个查询需要4分钟才能在我的jupyter笔记本上运行
答案 0 :(得分:1)
您始终可以遍历所有查询,并制作一个DataFrame并将其添加到列表中,如下所示:
from pymongo import MongoClient
import datetime as dt
import pandas as pd
mongo_client = MongoClient(...
credential...)
db_score = mongo_client['at-device-info']
cvsms = db_score['flat_sms']
list1 = [1,2,3,4,5] # list of values to search
list2 = [6,7,8,9,10] # list of values to search
lists = [list1,list2]
df_list = []
for lst in lists:
test = cvsms.find({'customer_id': {'$in': lst}}, {'customer_id': 1, 'timestamp': 1})
df = pd.DataFrame(list(test))
df_list.append(df)
# If you want to access each dataframe seperately from the list you can access the individual list elements
df1 = df_list[0]
df2 = df_list[1]
full_df = pd.concat(df_list)
如果您想加快速度,可以尝试将concurrent
模块与ThreadPoolExecutor
或ProcessPoolExecutor
一起使用:
from concurrent import futures
def query_df(lst):
test = cvsms.find({'customer_id': {'$in': lst}}, {'customer_id': 1, 'timestamp': 1})
df = pd.DataFrame(list(test))
return df
with futures.ThreadPoolExecutor(max_workers=4) as f:
df_list = f.map(query_df,lists)
full_df = pd.concat(df_list)
最后,您可以通过串联列表从较小的数据框中创建一个较大的数据框。