有超过 3 种产品(A、B、C...)这里只是一个例子
import pandas as pd
df = pd.DataFrame(data=[['A', '2021-01-06 10:25:21'],
['A', '2020-11-04 08:48:23'],
['B', '2020-10-26 17:04:34'],
['C', '2020-08-05 17:06:09'],
['A', '2021-04-15 13:20:50'],
['B', '2021-04-05 18:20:29']],
['A', '2021-04-15 13:20:50'],
['C', '2021-01-05 14:05:09'],
columns=['product', 'purchased_at'])
df
| product | purchased_at |
| -------- | --------------------|
| A |2021-01-06 10:25:21 |
| A |2020-11-04 08:48:23 |
| B |2020-10-26 17:04:34 |
| C |2020-08-05 17:06:09 |
| A |2021-04-15 13:20:50 |
| C |2021-01-05 14:05:09 |
| ... |... |
我想创建一个包含每个产品的平均间隔的新列
答案 0 :(得分:0)
内嵌评论
df = pd.DataFrame(data=[
['A', '2021-01-06 10:25:21'],
['A', '2020-11-04 08:48:23'],
['B', '2020-10-26 17:04:34'],
['C', '2020-08-05 17:06:09'],
['A', '2021-04-15 13:20:50'],
['B', '2021-04-05 18:20:29'],
['A', '2021-04-15 13:20:50'],
['C', '2021-01-05 14:05:09']],
columns=['product', 'purchased_at'])
# convert column to datetime
df['purchased_at'] = pd.to_datetime(df['purchased_at'])
# Sort by purchased_at
df = df.sort_values(by=['purchased_at'])
# Group by product and for each group take difference of `purchased_at` column
# of consecutive rows and finally mean of all the differences
df.groupby('product').apply(lambda x: np.mean(x['purchased_at'].diff()))
输出:
product
A 54 days 01:30:49
B 161 days 01:15:55
C 152 days 20:59:00
dtype: timedelta64[ns]