Question

我需要将两个单独的列组合到一个日期时间列。

pandas数据框如下所示：

calendarid    time_delta_actualdeparture    actualtriptime
20140101      0 days 06:35:49.000020000     27.11666667
20140101      0 days 06:51:37.000020000     24.83333333
20140101      0 days 07:11:40.000020000     28.1
20140101      0 days 07:31:40.000020000     23.03333333
20140101      0 days 07:53:34.999980000     23.3
20140101      0 days 08:14:13.000020000     51.81666667

我想将其转换为如下所示：

calendarid               actualtriptime
2014-01-01 6:30:00       mean of trip times in time interval
2014-01-01 7:00:00       mean of trip times in time interval 
2014-01-01 7:30:00       mean of trip times in time interval
2014-01-01 8:00:00       mean of trip times in time interval
2014-01-01 8:30:00       mean of trip times in time interval

基本上我想将两列合并为一列，然后分组成30分钟的时间间隔，取该间隔内实际行程时间的平均值。我尝试了很多技术都没有成功，但我还在学习python / pandas。任何人都可以帮我这个吗？

Answer 1

转换您的＆＃39; calendarid＆＃39;列到日期时间并添加增量以获取开始时间。

In [5]: df['calendarid'] = pd.to_datetime(df['calendarid'], format='%Y%m%d')


In [7]: df['calendarid'] = df['calendarid'] + df['time_delta_actualdeparture']

In [8]: df
Out[8]: 
                  calendarid  time_delta_actualdeparture  actualtriptime
0 2014-01-01 06:35:49.000020             06:35:49.000020       27.116667
1 2014-01-01 06:51:37.000020             06:51:37.000020       24.833333
2 2014-01-01 07:11:40.000020             07:11:40.000020       28.100000
3 2014-01-01 07:31:40.000020             07:31:40.000020       23.033333
4 2014-01-01 07:53:34.999980             07:53:34.999980       23.300000
5 2014-01-01 08:14:13.000020             08:14:13.000020       51.816667

然后，您可以将日期列设置为索引，并以30分钟的频率重新取样，以获得每个间隔的平均值。

In [19]: df.set_index('calendarid').resample('30Min', how='mean', label='right')
Out[19]: 
                     actualtriptime
calendarid                         
2014-01-01 07:00:00       25.975000
2014-01-01 07:30:00       28.100000
2014-01-01 08:00:00       23.166667
2014-01-01 08:30:00       51.816667

结合timedelta和日期列，按时间间隔分组

1 个答案: