熊猫在Mongo中的转变量

时间:2019-02-18 10:01:41

标签: mongodb

我正在处理一组数据,这些数据可以为每种产品提供其库存状态。该集合存储在Mongo DB中。

myCollectionData = [
    {"product_id":1, "inventory": 100, "date": "2019-12-01"},
    {"product_id":2, "inventory": 40, "date": "2019-12-02"},
    {"product_id":3, "inventory": 20, "date": "2019-12-05"},
    {"product_id":1, "inventory": 70, "date": "2019-12-15"},
    {"product_id":2, "inventory": 10, "date": "2019-12-16"},
    {"product_id":3, "inventory": 5, "date": "2019-12-17"}
]

我想创建一个密钥,用于存储每种产品的销售额:

//Output
[
    {"product_id":1, "inventory": 100, "date": "2019-12-01", "Sales": 0},
    {"product_id":2, "inventory": 40, "date": "2019-12-02", "Sales": 0},
    {"product_id":3, "inventory": 20, "date": "2019-12-05", "Sales": 0},
    {"product_id":1, "inventory": 70, "date": "2019-12-15", "Sales": 30},
    {"product_id":2, "inventory": 10, "date": "2019-12-16", "Sales": 30},
    {"product_id":3, "inventory": 5, "date": "2019-12-17", "Sales": 15},
]

到目前为止,我已经使用Panda的Shift实现了这一点:

df = pd.DataFrame(myCollectionData)

#Sort data
df.sort_values(by=['product_id','date' ], ascending=[True, True], inplace = True)

#Shift on product_ID
df['previous_product_id'] = df['product_id'].shift(1).apply(lambda x: int(x) if  x >0 else 0 )
#Shift on inventory
df['previous_inventory'] = df['inventory'].shift(1)

df['inventory_variation'] = df['sales'] =  np.nan
df['inventory_variation'][(df['previous_product_id'] == df['product_id'])] = df['inventory']-df['previous_inventory']

## Column sales = 0 if inventory variation is positive
## Column sales >0 if inventory variation is negative
df['sales'][df['inventory_variation']>=0]=0
df['sales'][df['inventory_variation']<0]= -df['inventory_variation']


## Pandas to list for MongoDB
records = json.loads(df.T.to_json()).values()

这很好。但是我想知道是否可以直接在Mongo中执行此操作以保存数据导入(数据集> 1M记录并且正在增长)。

蒙戈(Mongo)中是否有“换班”操作,可以让我实现相同的目标?

0 个答案:

没有答案