我正在处理一组数据,这些数据可以为每种产品提供其库存状态。该集合存储在Mongo DB中。
myCollectionData = [
{"product_id":1, "inventory": 100, "date": "2019-12-01"},
{"product_id":2, "inventory": 40, "date": "2019-12-02"},
{"product_id":3, "inventory": 20, "date": "2019-12-05"},
{"product_id":1, "inventory": 70, "date": "2019-12-15"},
{"product_id":2, "inventory": 10, "date": "2019-12-16"},
{"product_id":3, "inventory": 5, "date": "2019-12-17"}
]
我想创建一个密钥,用于存储每种产品的销售额:
//Output
[
{"product_id":1, "inventory": 100, "date": "2019-12-01", "Sales": 0},
{"product_id":2, "inventory": 40, "date": "2019-12-02", "Sales": 0},
{"product_id":3, "inventory": 20, "date": "2019-12-05", "Sales": 0},
{"product_id":1, "inventory": 70, "date": "2019-12-15", "Sales": 30},
{"product_id":2, "inventory": 10, "date": "2019-12-16", "Sales": 30},
{"product_id":3, "inventory": 5, "date": "2019-12-17", "Sales": 15},
]
到目前为止,我已经使用Panda的Shift实现了这一点:
df = pd.DataFrame(myCollectionData)
#Sort data
df.sort_values(by=['product_id','date' ], ascending=[True, True], inplace = True)
#Shift on product_ID
df['previous_product_id'] = df['product_id'].shift(1).apply(lambda x: int(x) if x >0 else 0 )
#Shift on inventory
df['previous_inventory'] = df['inventory'].shift(1)
df['inventory_variation'] = df['sales'] = np.nan
df['inventory_variation'][(df['previous_product_id'] == df['product_id'])] = df['inventory']-df['previous_inventory']
## Column sales = 0 if inventory variation is positive
## Column sales >0 if inventory variation is negative
df['sales'][df['inventory_variation']>=0]=0
df['sales'][df['inventory_variation']<0]= -df['inventory_variation']
## Pandas to list for MongoDB
records = json.loads(df.T.to_json()).values()
这很好。但是我想知道是否可以直接在Mongo中执行此操作以保存数据导入(数据集> 1M记录并且正在增长)。
蒙戈(Mongo)中是否有“换班”操作,可以让我实现相同的目标?