熊猫获得平均间隔

时间:2021-06-14 07:58:47

标签: python pandas data-science

有超过 3 种产品(A、B、C...)这里只是一个例子

import pandas as pd

df = pd.DataFrame(data=[['A', '2021-01-06 10:25:21'],
                    ['A', '2020-11-04 08:48:23'],
                    ['B', '2020-10-26 17:04:34'],
                    ['C', '2020-08-05 17:06:09'],
                    ['A', '2021-04-15 13:20:50'],
                    ['B', '2021-04-05 18:20:29']],
                    ['A', '2021-04-15 13:20:50'],
                    ['C', '2021-01-05 14:05:09'],
              columns=['product', 'purchased_at'])
 df
 | product  | purchased_at        |
 | -------- | --------------------|
 | A        |2021-01-06 10:25:21  |
 | A        |2020-11-04 08:48:23  |
 | B        |2020-10-26 17:04:34  |
 | C        |2020-08-05 17:06:09  |
 | A        |2021-04-15 13:20:50  |
 | C        |2021-01-05 14:05:09  |
 | ...      |...                  |

我想创建一个包含每个产品的平均间隔的新列

1 个答案:

答案 0 :(得分:0)

内嵌评论

df = pd.DataFrame(data=[
                        ['A', '2021-01-06 10:25:21'],
                        ['A', '2020-11-04 08:48:23'],
                        ['B', '2020-10-26 17:04:34'],
                        ['C', '2020-08-05 17:06:09'],
                        ['A', '2021-04-15 13:20:50'],
                        ['B', '2021-04-05 18:20:29'],
                        ['A', '2021-04-15 13:20:50'],
                        ['C', '2021-01-05 14:05:09']],
              columns=['product', 'purchased_at'])

# convert column to datetime
df['purchased_at'] = pd.to_datetime(df['purchased_at'])
# Sort by purchased_at
df = df.sort_values(by=['purchased_at'])
# Group by product and for each group take difference of `purchased_at` column
# of consecutive rows and finally mean of all the differences
df.groupby('product').apply(lambda x: np.mean(x['purchased_at'].diff()))

输出:

product
A    54 days 01:30:49
B   161 days 01:15:55
C   152 days 20:59:00
dtype: timedelta64[ns]