Question

我有数据框

ID   date
111   11-11-2016
111   14-11-2016
111   17-11-2016
222   24-11-2016
222   27-11-2016

我需要计算数据与每个ID之间的差异。我用

df['duration'] = df.groupby(['ID','date']).date.apply(lambda x: x - x.iloc[0])
idx = df.groupby(['ID'])['duration'].transform(max) == df['count date']

但它返回错误的结果。我怎么能得到满意的？我需要得到

ID   count date
111    6
222    3

Answer 1

CurrentPage

也可以使用numpy.ptp

>>> df.groupby('ID')['date'].apply(lambda x: x.max() - x.min())
ID
111   6 days
222   3 days

Answer 2

使用agg内置的min和max See this post

d1 = df.groupby('ID').date.agg(['min', 'max']).diff(axis=1)['max']

ID
111   6 days
222   3 days
dtype: timedelta64[ns]

熊猫：计算日期之间的差异

2 个答案: