我将每周日历的详细信息(显然是将受试者更改以保护无辜者)读入大熊猫数据框。我的目标之一是获得会议的总时间。我希望有一个由date_range索引的数据框,其中每周的小时频率显示我在这几个小时内的会议总分钟数。我的第一个挑战是会议重叠,并且我希望同时在两个地方,我肯定不会。我跳出一个,但跳到另一个。因此,例如,索引8和9的行应该是90分钟而不是120分钟的总会议时间,如果我只是df ['持续时间'],则总和会议时间。总和()列。如何将数据帧中的时间段展平为仅计算重叠一次?似乎有一个使用date_range和句点的答案,但我无法绕过它。下面是我的数据帧df。
Start End Duration Subject
0 07/04/16 10:30:00 07/04/16 11:00:00 30 Inspirational Poster Design Session
1 07/04/16 15:00:00 07/04/16 15:30:00 30 Corporate Speak Do's and Don'ts
2 07/04/16 09:00:00 07/04/16 12:00:00 180 Metrics or Matrix -Panel Discussion
3 07/04/16 13:30:00 07/04/16 15:00:00 90 "Do More with Less" kickoff party
4 07/05/16 09:00:00 07/05/16 10:00:00 60 Fiscal or Physical -Panel Discussion
5 07/05/16 14:00:00 07/05/16 14:30:00 30 "Why we can't have nice thing" training video
6 07/06/16 15:00:00 07/06/16 16:00:00 60 One-on-One with manager -Panel Discussion
7 07/06/16 09:00:00 07/06/16 10:00:00 60 Fireing for Performance leadership session
8 07/06/16 13:00:00 07/06/16 14:00:00 60 Birthday Cake in the conference room *MANDATORY*
9 07/06/16 12:30:00 07/06/16 13:30:00 60 Obligatory lunchtime meeting because it was the only time everyone had avaiable
非常感谢任何帮助。
编辑: 这是我希望使用上述数据集的输出。
2016-07-04 00:00:00 0
2016-07-04 01:00:00 0
2016-07-04 02:00:00 0
2016-07-04 03:00:00 0
2016-07-04 04:00:00 0
2016-07-04 05:00:00 0
2016-07-04 06:00:00 0
2016-07-04 07:00:00 0
2016-07-04 08:00:00 0
2016-07-04 09:00:00 60
2016-07-04 10:00:00 60
2016-07-04 11:00:00 60
2016-07-04 12:00:00 0
2016-07-04 13:00:00 30
2016-07-04 14:00:00 60
2016-07-04 15:00:00 30
2016-07-04 16:00:00 0
2016-07-04 17:00:00 0
2016-07-04 18:00:00 0
2016-07-04 19:00:00 0
2016-07-04 20:00:00 0
2016-07-04 21:00:00 0
2016-07-04 22:00:00 0
2016-07-04 23:00:00 0
2016-07-05 00:00:00 0
2016-07-05 01:00:00 0
2016-07-05 02:00:00 0
2016-07-05 03:00:00 0
2016-07-05 04:00:00 0
2016-07-05 05:00:00 0
2016-07-05 06:00:00 0
2016-07-05 07:00:00 0
2016-07-05 08:00:00 0
2016-07-05 09:00:00 60
2016-07-05 10:00:00 0
2016-07-05 11:00:00 0
2016-07-05 12:00:00 0
2016-07-05 13:00:00 0
2016-07-05 14:00:00 30
2016-07-05 15:00:00 0
2016-07-05 16:00:00 0
2016-07-05 17:00:00 0
2016-07-05 18:00:00 0
2016-07-05 19:00:00 0
2016-07-05 20:00:00 0
2016-07-05 21:00:00 0
2016-07-05 22:00:00 0
2016-07-05 23:00:00 0
2016-07-06 00:00:00 0
2016-07-06 01:00:00 0
2016-07-06 02:00:00 0
2016-07-06 03:00:00 0
2016-07-06 04:00:00 0
2016-07-06 05:00:00 0
2016-07-06 06:00:00 0
2016-07-06 07:00:00 0
2016-07-06 08:00:00 0
2016-07-06 09:00:00 60
2016-07-06 10:00:00 0
2016-07-06 11:00:00 0
2016-07-06 12:00:00 30
2016-07-06 13:00:00 60
2016-07-06 14:00:00 0
2016-07-06 15:00:00 60
2016-07-06 16:00:00 0
2016-07-06 17:00:00 0
2016-07-06 18:00:00 0
2016-07-06 19:00:00 0
2016-07-06 20:00:00 0
2016-07-06 21:00:00 0
2016-07-06 22:00:00 0
2016-07-06 23:00:00 0
2016-07-07 00:00:00 0
答案 0 :(得分:1)
一种可能性是创建一个以分钟为索引的时间序列(下面为s
),用于记录您是否在该分钟内参加会议,然后按小时重新采样。要匹配您想要的输出,您可以调整索引s
的开始和结束时间。
import io
import pandas as pd
data = io.StringIO('''\
Start,End,Duration,Subject
0,07/04/16 10:30:00,07/04/16 11:00:00,30,Inspirational Poster Design Session
1,07/04/16 15:00:00,07/04/16 15:30:00,30,Corporate Speak Do's and Don'ts
2,07/04/16 09:00:00,07/04/16 12:00:00,180,Metrics or Matrix -Panel Discussion
3,07/04/16 13:30:00,07/04/16 15:00:00,90,"Do More with Less" kickoff party
4,07/05/16 09:00:00,07/05/16 10:00:00,60,Fiscal or Physical -Panel Discussion
5,07/05/16 14:00:00,07/05/16 14:30:00,30,"Why we can't have nice thing" training video
6,07/06/16 15:00:00,07/06/16 16:00:00,60,One-on-One with manager -Panel Discussion
7,07/06/16 09:00:00,07/06/16 10:00:00,60,Fireing for Performance leadership session
8,07/06/16 13:00:00,07/06/16 14:00:00,60,Birthday Cake in the conference room *MANDATORY*
9,07/06/16 12:30:00,07/06/16 13:30:00,60,Obligatory lunchtime meeting because it was the only time everyone
''')
df = pd.read_csv(data, usecols=['Start', 'End', 'Subject'])
df['Start'] = pd.to_datetime(df['Start'])
df['End'] = pd.to_datetime(df['End'])
# Ranges in datetime indices include the right endpoint
tdel = pd.Timedelta('1min')
s = pd.Series(False, index=pd.date_range(start=df['Start'].min(),
end=df['End'].max()-tdel,
freq='min'))
for _, meeting in df.iterrows():
s[meeting['Start'] : meeting['End']-tdel] = True
result = s.resample('1H').sum().astype(int)
print(result)
输出:
2016-07-04 09:00:00 60
2016-07-04 10:00:00 60
2016-07-04 11:00:00 60
2016-07-04 12:00:00 0
2016-07-04 13:00:00 30
2016-07-04 14:00:00 60
2016-07-04 15:00:00 30
2016-07-04 16:00:00 0
2016-07-04 17:00:00 0
2016-07-04 18:00:00 0
2016-07-04 19:00:00 0
2016-07-04 20:00:00 0
2016-07-04 21:00:00 0
2016-07-04 22:00:00 0
2016-07-04 23:00:00 0
2016-07-05 00:00:00 0
2016-07-05 01:00:00 0
2016-07-05 02:00:00 0
2016-07-05 03:00:00 0
2016-07-05 04:00:00 0
2016-07-05 05:00:00 0
2016-07-05 06:00:00 0
2016-07-05 07:00:00 0
2016-07-05 08:00:00 0
2016-07-05 09:00:00 60
2016-07-05 10:00:00 0
2016-07-05 11:00:00 0
2016-07-05 12:00:00 0
2016-07-05 13:00:00 0
2016-07-05 14:00:00 30
2016-07-05 15:00:00 0
2016-07-05 16:00:00 0
2016-07-05 17:00:00 0
2016-07-05 18:00:00 0
2016-07-05 19:00:00 0
2016-07-05 20:00:00 0
2016-07-05 21:00:00 0
2016-07-05 22:00:00 0
2016-07-05 23:00:00 0
2016-07-06 00:00:00 0
2016-07-06 01:00:00 0
2016-07-06 02:00:00 0
2016-07-06 03:00:00 0
2016-07-06 04:00:00 0
2016-07-06 05:00:00 0
2016-07-06 06:00:00 0
2016-07-06 07:00:00 0
2016-07-06 08:00:00 0
2016-07-06 09:00:00 60
2016-07-06 10:00:00 0
2016-07-06 11:00:00 0
2016-07-06 12:00:00 30
2016-07-06 13:00:00 60
2016-07-06 14:00:00 0
2016-07-06 15:00:00 60
Freq: H, dtype: int64