**在底部编辑**
我有一个包含清单数据的数据框,如下所示:
d = {'product': [a, b, a, b, c], 'amount': [1, 2, 3, 5, 2], 'date': [2020-6-6, 2020-6-6, 2020-6-7,
2020-6-7, 2020-6-7]}
df = pd.DataFrame(data=d)
df
product amount date
0 a 1 2020-6-6
1 b 2 2020-6-6
2 a 3 2020-6-7
3 b 5 2020-6-7
4 c 2 2020-6-7
我想知道每个月的库存差异。输出看起来像这样:
df
product diff isnew date
0 a nan nan 2020-6-6
1 b nan nan 2020-6-6
2 a 2 False 2020-6-7
3 b 3 False 2020-6-7
4 c 2 True 2020-6-7
很抱歉,如果我在第一个示例中不清楚,实际上我有很多个月的数据,所以我不只是在看一个时期与另一个时期的差异。在一般情况下,它需要查看月份n与n-1的差额,然后是n-1和n-2的差额,依此类推。
在熊猫中做到这一点的最佳方法是什么?
答案 0 :(得分:2)
您可以在列乘积上尝试groupby
,在列'diff'上尝试diff
列数。然后将duplicated
用于“ isnew”列。
df['diff'] = df.groupby('product')['amount'].diff()
df['isnew'] = ~df['product'].duplicated()
print (df)
product amount date diff isnew
0 a 1 2020-6-6 NaN True
1 b 2 2020-6-6 NaN True
2 a 3 2020-6-7 2.0 False
3 b 5 2020-6-7 3.0 False
4 c 2 2020-6-7 NaN True
答案 1 :(得分:2)
我想这里的关键是找到isnew
:
# new products by `product`
new_prods = df['date'] != df.date.min()
duplicated = df.duplicated('product')
# first appearance of new products
# or duplicated *old* products
valids = new_prods ^ duplicated
df.loc[valids,'is_new'] = ~ duplicated
# then the difference:
df['diff'] = (df.groupby('product')['amount'].diff() # normal differences
.fillna(df['amount']) # fill the first value for all product
.where(df['is_new'].notna()) # remove the first month
)
输出:
product amount date is_new diff
0 a 1 2020-6-6 NaN NaN
1 b 2 2020-6-6 NaN NaN
2 a 3 2020-6-7 False 2.0
3 b 5 2020-6-7 False 3.0
4 c 2 2020-6-7 True 2.0