我有一个df,例如:
date | prod_number | prod_count | prod_factor
2018-01-01 | 1 | 5 | 3
2018-02-01 | 1 | 20 | 3
2018-04-01 | 1 | 10 | 3
2019-09-01 | 2 | 8 | 5
2018-09-02 | 2 | 7 | 5
2018-10-03 | 2 | 10 | 5
对于每个“ prod_number”,我想要从上次日期开始进行更改,然后乘以prod_factor:
每个“ prod_number”的第一个条目都没有计算差的值,因此它为NONE或0,更容易些。
赞:
date | prod_number | prod_count | prod_factor | change | prod_factor*change
2018-01-01 | 1 | 5 | 3 | NONE/0 | NONE/0
2018-02-01 | 1 | 20 | 3 | 15 # 20-5 | 45 # 3*15
2018-04-01 | 1 | 10 | 3 | -10 # 10-20 | -30 # 3*-10
2019-09-01 | 2 | 8 | 5 | NONE/0 | NONE/0
2018-09-02 | 2 | 7 | 5 | -1 # 7-8 | -5 # 5*-1
2018-10-03 | 2 | 10 | 5 | 3 # 10-7 | 15 # 5*3
我如何用熊猫来做到这一点?
答案 0 :(得分:1)
使用groupby.diff
,然后将两列相乘:
df['change'] = df.groupby('prod_number')['prod_count'].diff()
df['prod_factor*change'] = df['change'] * df['prod_factor']
date prod_number prod_count prod_factor change prod_factor*change
0 2018-01-01 1 5 3 NaN NaN
1 2018-02-01 1 20 3 15.0 45.0
2 2018-04-01 1 10 3 -10.0 -30.0
3 2019-09-01 2 8 5 NaN NaN
4 2018-09-02 2 7 5 -1.0 -5.0
5 2018-10-03 2 10 5 3.0 15.0
答案 1 :(得分:0)
您可以使用np.where和diff()
import pandas as pd
import numpy as np
df=pd.DataFrame([['2018 - 01 - 01',1,5,3],['2018 - 02 - 01',1,20,3],['2018 - 04 - 01',1,10,3],['2019 - 09 - 01',2,8,5],['2018 - 09 - 02',2,7,5],['2018 - 10 - 03',2,10,5] ],
columns=['date','prod_number','prod_count','prod_factor'])
df['change']=np.where(
df['prod_number'].diff() == 0, #cond to check if prod_number is the same
df['prod_count'].diff(), #value if true
0 #else we 0
)
date prod_number prod_count prod_factor change
0 2018 - 01 - 01 1 5 3 0.0
1 2018 - 02 - 01 1 20 3 15.0
2 2018 - 04 - 01 1 10 3 -10.0
3 2019 - 09 - 01 2 8 5 0.0
4 2018 - 09 - 02 2 7 5 -1.0
5 2018 - 10 - 03 2 10 5 3.0