计算连续日期列与大熊猫另一列上的groupby之间的差异?

时间:2020-01-21 05:12:59

标签: python pandas

我有一个熊猫数据框,

data = pd.DataFrame([['Car','2019-01-06T21:44:09Z'],
                     ['Train','2019-01-06T19:44:09Z'],
                     ['Train','2019-01-02T19:44:09Z'],
                     ['Car','2019-01-08T06:44:09Z'],
                     ['Car','2019-01-06T18:44:09Z'],
                     ['Train','2019-01-04T19:44:09Z'],
                     ['Car','2019-01-05T16:34:09Z'],
                     ['Train','2019-01-08T19:44:09Z'],
                     ['Car','2019-01-07T14:44:09Z'],
                     ['Car','2019-01-06T11:44:09Z'],
                     ['Train','2019-01-10T19:44:09Z'],
                     ], 
                    columns=['Type', 'Date'])

在按日期对日期进行排序后,需要找出每种类型的连续日期之间的差异

最终数据如下

data = pd.DataFrame([['Car','2019-01-06T21:44:09Z',1],
                     ['Train','2019-01-06T19:44:09Z',4],
                     ['Train','2019-01-02T19:44:09Z',0],
                     ['Car','2019-01-08T06:44:09Z',3],
                     ['Car','2019-01-06T18:44:09Z',1],
                     ['Train','2019-01-04T19:44:09Z',2],
                     ['Car','2019-01-05T16:34:09Z',0],
                     ['Train','2019-01-08T19:44:09Z',6],
                     ['Car','2019-01-07T14:44:09Z',2],
                     ['Car','2019-01-06T11:44:09Z',1],
                     ['Train','2019-01-10T19:44:09Z',8],
                     ], 
                    columns=['Type', 'Date','diff'])

在这里,类型车的min(Date)是2019-01-05T16:34:09Z,因此差异从0开始,然后第二个日期是2019-01-06T18:44:09Z和2019-01-06T11:44 :09Z,因此差异为1天(此处不确定是否可以包含时间),依此类推。 对于Type Train min(Date)是2019-01-02T19:44:09Z,所以diff为0,那么2019-01-04T19:44:09Z所以2天为diff

我尝试了groupby,但不确定如何包括日期排序

data['diff'] = data.groupby('Type')['Date'].diff() / np.timedelta64(1, 'D')

3 个答案:

答案 0 :(得分:2)

southpandas.DataFrame.groupby一起使用:

dt.date

输出:

df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date())

如果您希望它们成为 Type Date diff 0 Car 2019-01-06 21:44:09+00:00 1 days 1 Train 2019-01-06 19:44:09+00:00 4 days 2 Train 2019-01-02 19:44:09+00:00 0 days 3 Car 2019-01-08 06:44:09+00:00 3 days 4 Car 2019-01-06 18:44:09+00:00 1 days 5 Train 2019-01-04 19:44:09+00:00 2 days 6 Car 2019-01-05 16:34:09+00:00 0 days 7 Train 2019-01-08 19:44:09+00:00 6 days 8 Car 2019-01-07 14:44:09+00:00 2 days 9 Car 2019-01-06 11:44:09+00:00 1 days 10 Train 2019-01-10 19:44:09+00:00 8 days ,请添加int

dt.days

输出:

df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date()).dt.days

答案 1 :(得分:1)

  • 首先将Date转换为date到其他列
  • 使用lambda函数减去日期的最小值并使用dt.days查找日期
  • 然后删除多余的日期列
data['Date_date'] = pd.to_datetime(data['Date']).dt.date
data['diff'] = data.groupby(['Type'])['Date_date'].apply(lambda x:(x-x.min()).dt.days)
data.drop(['Date_date'],axis=1,inplace=True,errors='ignore')
print(data)
     Type                  Date  diff
0     Car  2019-01-06T21:44:09Z     1
1   Train  2019-01-06T19:44:09Z     4
2   Train  2019-01-02T19:44:09Z     0
3     Car  2019-01-08T06:44:09Z     3
4     Car  2019-01-06T18:44:09Z     1
5   Train  2019-01-04T19:44:09Z     2
6     Car  2019-01-05T16:34:09Z     0
7   Train  2019-01-08T19:44:09Z     6
8     Car  2019-01-07T14:44:09Z     2
9     Car  2019-01-06T11:44:09Z     1
10  Train  2019-01-10T19:44:09Z     8

答案 2 :(得分:1)

直接从transform减去

s = pd.to_datetime(data['Date']).dt.date
data['diff'] = (s - s.groupby(data.Type).transform('min')).dt.days

Out[36]:
     Type                  Date  diff
0     Car  2019-01-06T21:44:09Z     1
1   Train  2019-01-06T19:44:09Z     4
2   Train  2019-01-02T19:44:09Z     0
3     Car  2019-01-08T06:44:09Z     3
4     Car  2019-01-06T18:44:09Z     1
5   Train  2019-01-04T19:44:09Z     2
6     Car  2019-01-05T16:34:09Z     0
7   Train  2019-01-08T19:44:09Z     6
8     Car  2019-01-07T14:44:09Z     2
9     Car  2019-01-06T11:44:09Z     1
10  Train  2019-01-10T19:44:09Z     8