时间间隔由pandas中的2个字段划分

时间:2016-02-04 17:25:33

标签: python python-2.7 pandas

我有以下数据框:

   id         datetime  interval
0   1  20160101 070000       NaN
1   1  20160101 080000        60
2   1  20160102 070000       NaN
3   1  20160102 073000        30
4   2  20160101 071500       NaN
5   2  20160101 071600         1

并且想要生成间隔列 - 行之间的分钟,但仅针对相同的id&同一天,就像在示例中一样 - 所以在sql中我会按id和datetime进行分区,并使用LAG作为前一行之间的时间间隔。我怎么能在熊猫中做到这一点?

1 个答案:

答案 0 :(得分:0)

您可以转换datetime to_datetime列,并将groupbydiff一起使用,并将timedelta转换为astype分钟:

print df
   id         datetime  interval
0   1  20160101 070000       NaN
1   1  20160101 080000        60
2   1  20160102 070000       NaN
3   1  20160102 073000        30
4   2  20160101 071500       NaN
5   2  20160101 071600         1

df['datetime'] = pd.to_datetime(df['datetime'])
df['new']=df.groupby(['id',df['datetime'].dt.day])['datetime'].diff().astype('timedelta64[m]')
print df
   id            datetime  interval  new
0   1 2016-01-01 07:00:00       NaN  NaN
1   1 2016-01-01 08:00:00        60   60
2   1 2016-01-02 07:00:00       NaN  NaN
3   1 2016-01-02 07:30:00        30   30
4   2 2016-01-01 07:15:00       NaN  NaN
5   2 2016-01-01 07:16:00         1    1