我想将时间格式从12:45更改为datetime格式,同时保持该格式不变,并计算活动的时差(activity_duration的结果)。第二,我想对按activity_station分组的activity_duration求和
我将时间更改为日期时间格式,但是我得到了随机的年,月,日等信息。我知道如何分组而不是在应用分组依据时如何消除重复项。
df = pd.DataFrame({
'Shift_id' :[ 123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,
345,345,345,345,345,345,345,345,345,345,345,345,345,345,345,345],
'activity_id' : [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,
6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9],
'activity_begin_time' : ['09:00','09:05','12:00','12:30','17:25','09:00','09:05','12:00','12:30','17:25','09:00','09:05','12:00','12:30','17:25',
'09:00','09:05','12:00','12:30','09:00','09:05','12:00','12:30','09:00','09:05','12:00','12:30','09:00','09:05','12:00','12:30'],
'activity_end_time' : ['09:05','12:00','12:30', '17:25','17:30','09:05','12:00','12:30', '17:25','17:30','09:05','12:00','12:30', '17:25','17:30',
'09:05','12:00','12:30', '17:25','09:05','12:00','12:30', '17:25','09:05','12:00','12:30', '17:25','09:05','12:00','12:30', '17:25'],
'activity_station' : ['None', 'Za','None','Ba','None','None', 'Za','None','Ba','None','None', 'Za','None','Ba','None',
'None','Za','Ba','Ra','None','Za','Ba','Ra','None','Za','Ba','Ra','None','Za','Ba','Ra']
})
df['activity_begin_time'] = pd.to_datetime(df['activity_begin_time'])
df['activity_end_time'] = pd.to_datetime(df['activity_end_time'])
df['activity_duration'] = df['activity_end_time'] - df['activity_begin_time']
df['activity_duration'] = df['activity_duration']/np.timedelta64(1,'h')
我想对activity_station分组的acitivity_duration求和,同时消除重复的值
答案 0 :(得分:2)
这是我的解决方法:
df = pd.DataFrame({
'Shift_id' :[ 123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,
345,345,345,345,345,345,345,345,345,345,345,345,345,345,345,345],
'activity_id' : [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,
6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9],
'activity_begin_time' : ['09:00','09:05','12:00','12:30','17:25','09:00','09:05','12:00','12:30','17:25','09:00','09:05','12:00','12:30','17:25',
'09:00','09:05','12:00','12:30','09:00','09:05','12:00','12:30','09:00','09:05','12:00','12:30','09:00','09:05','12:00','12:30'],
'activity_end_time' : ['09:05','12:00','12:30', '17:25','17:30','09:05','12:00','12:30', '17:25','17:30','09:05','12:00','12:30', '17:25','17:30',
'09:05','12:00','12:30', '17:25','09:05','12:00','12:30', '17:25','09:05','12:00','12:30', '17:25','09:05','12:00','12:30', '17:25'],
'activity_station' : ['None', 'Za','None','Ba','None','None', 'Za','None','Ba','None','None', 'Za','None','Ba','None',
'None','Za','Ba','Ra','None','Za','Ba','Ra','None','Za','Ba','Ra','None','Za','Ba','Ra']
})
丢弃重复的内容:
df = df.drop_duplicates()
df['activity_begin_time'] = pd.to_timedelta(df['activity_begin_time']+':00')
df['activity_end_time'] = pd.to_timedelta(df['activity_end_time']+':00')
df['activity_duration'] = df['activity_end_time'] - df['activity_begin_time']
然后您可以对每列使用特定的聚合,并使用groupby:
df.groupby('activity_station').agg({'activity_duration': np.sum})
哪个会产生:
activity_duration
activity_station
Ba 05:25:00
None 00:45:00
Ra 04:55:00
Za 05:50:00