Question

我有一张商店和产品的入店行窃表。我尝试使用Python来计算按产品划分的入店行窃事件之间的平均天数。我的表看起来像这样：

Product   Store  Shoplifting date  Times shoplifted
1         A      8/28/2016         6
2         A      8/28/2016         6
3         A      8/28/2016         6
2         B      8/22/2016         3
1         B      8/22/2016         3
3         B      8/22/2016         3
1         C      8/18/2016         2
3         C      8/18/2016         2
4         C      8/18/2016         2
1         A      8/18/2016         5
3         A      8/18/2016         5
1         B      8/16/2016         2
1         A      8/14/2016         4
4         C      8/13/2016         1
3         A      8/12/2016         4
2         A      8/12/2016         4

产品1在8/28，8/18和8/14（盗窃之间10天和4天）和商店B在8/22和8/16（8天）从商店A被盗，平均（10 + 4 + 8）/ 3 = 7.33天。因此，对于产品1，预期结果将是：

Product    Days between shoplifting
1          7.33

＆＃34; Times shoplifted＆＃34;列是商店被盗用的累计次数。它随着每次入店行窃事件而增加。因此，例如，在2016年8月28日，商店A被抢购了第1,2和3项商品。这是第6次商店被抢购一空。

我正在尝试计算按产品进行入店行窃之间的平均天数。我已经写了很多for循环，而且它变得非常混乱所以我想要一个更干净的方法来做它。我对熊猫不太熟悉，但我相信它有一些方便的时间处理能力......？你会如何在熊猫中解决这个问题？或者有更好的方法吗？

Answer 1

我首先按Shoplifting date对数据框进行排序，然后对于每个组，diff将为您提供时间增量，mean将对它们进行平均。

df.sort_values('Shoplifting date').groupby(
    'Product'
)['Shoplifting date'].apply(lambda x: x.diff().mean()).dropna()

Product
1      0 days
3      0 days
582   10 days
650    4 days
Name: Shoplifting date, dtype: timedelta64[ns]

如何按类别1和类别2计算事件之间的平均天数

1 个答案: