熊猫柱随时间的差异

时间:2020-06-08 03:52:44

标签: python pandas dataframe datetime

**在底部编辑**

我有一个包含清单数据的数据框,如下所示:

d = {'product': [a, b, a, b, c], 'amount': [1, 2, 3, 5, 2], 'date': [2020-6-6, 2020-6-6, 2020-6-7, 
2020-6-7, 2020-6-7]}
df = pd.DataFrame(data=d)
df
 product  amount  date
0     a     1      2020-6-6
1     b     2      2020-6-6
2     a     3      2020-6-7
3     b     5      2020-6-7
4     c     2      2020-6-7

我想知道每个月的库存差异。输出看起来像这样:

df
 product   diff   isnew  date
0     a     nan   nan   2020-6-6
1     b     nan   nan   2020-6-6
2     a     2     False 2020-6-7
3     b     3     False 2020-6-7
4     c     2     True  2020-6-7

很抱歉,如果我在第一个示例中不清楚,实际上我有很多个月的数据,所以我不只是在看一个时期与另一个时期的差异。在一般情况下,它需要查看月份n与n-1的差额,然后是n-1和n-2的差额,依此类推。

在熊猫中做到这一点的最佳方法是什么?

2 个答案:

答案 0 :(得分:2)

您可以在列乘积上尝试groupby,在列'diff'上尝试diff列数。然后将duplicated用于“ isnew”列。

df['diff'] = df.groupby('product')['amount'].diff()
df['isnew'] = ~df['product'].duplicated()
print (df)
  product  amount      date  diff  isnew
0       a       1  2020-6-6   NaN   True
1       b       2  2020-6-6   NaN   True
2       a       3  2020-6-7   2.0  False
3       b       5  2020-6-7   3.0  False
4       c       2  2020-6-7   NaN   True

答案 1 :(得分:2)

我想这里的关键是找到isnew

# new products by `product`
new_prods = df['date'] != df.date.min()
duplicated = df.duplicated('product')

# first appearance of new products
# or duplicated *old* products
valids = new_prods ^ duplicated
df.loc[valids,'is_new'] = ~ duplicated

# then the difference:
df['diff'] = (df.groupby('product')['amount'].diff()           # normal differences
                  .fillna(df['amount'])         # fill the first value for all product
                  .where(df['is_new'].notna())  # remove the first month
             )

输出:

  product  amount      date is_new  diff
0       a       1  2020-6-6    NaN   NaN
1       b       2  2020-6-6    NaN   NaN
2       a       3  2020-6-7  False   2.0
3       b       5  2020-6-7  False   3.0
4       c       2  2020-6-7   True   2.0