我有一个df
,按AccountID
和PurchaseDate
排序。我想做的是计算并创建PurchaseDate
每组中AccountID
之间差异的新列。
AccountID PurchaseDate Price
| 113 2018-09-01 22:56:30 13|
| 113 2018-09-02 22:56:30 19|
| 114 2018-09-01 22:56:30 20|
| 114 2018-09-03 22:56:30 25|
到
AccountID PurchaseDate Price DateDiff
| 113 2018-09-01 22:56:30 13 null|
| 113 2018-09-02 22:56:30 19 1 |
| 114 2018-09-01 22:56:30 20 null|
| 114 2018-09-03 22:56:30 25 2 |
答案 0 :(得分:2)
您可以这样做:
df['DateDiff'] = df.groupby('AccountID')['PurchaseDate'].\
diff().apply(lambda x: x.days)
答案 1 :(得分:1)
这是如何做到的完整示例:
import pandas as pd
df = pd.DataFrame({'AccountID': [113, 113, 114, 114],
'PurchaseDate': ['2018-09-01 22:56:30',
'2018-09-02 22:56:30',
'2018-09-01 22:56:30',
'2018-09-03 22:56:30'],
'Price': [13, 19, 20, 25]})
df['PurchaseDate'] = pd.to_datetime(df['PurchaseDate'])
df['DateDiff'] = df.groupby('AccountID').PurchaseDate.diff().fillna(0)
# AccountID Price PurchaseDate DateDiff
# 0 113 13 2018-09-01 22:56:30 0 days
# 1 113 19 2018-09-02 22:56:30 1 days
# 2 114 20 2018-09-01 22:56:30 0 days
# 3 114 25 2018-09-03 22:56:30 2 days
打开评论。