我想获取相关工作时间表的列表(包括活动日期,用户ID,活动,活动开始日期/时间,活动结束日期/时间),并将其分解为从一小时和半小时开始的30分钟间隔并列出该时间间隔内该活动的总时间。
给出...
>>> print(df)
activity_date user_id activity activity_start_time activity_end_time
11-Jun bob phones 06/11/2019 8:00 06/11/2019 9:00
11-Jun bob break 06/11/2019 9:00 06/11/2019 9:15
11-Jun bob phones 06/11/2019 9:15 06/11/2019 11:15
11-Jun bob lunch 06/11/2019 11:15 06/11/2019 12:15
我希望这是结果...
>>> print(df)
interval_start time_in_interval activity_date user_id activity activity_start_time activity_end_time
06/11/2019 8:00 30 11-Jun bob phones 06/11/2019 8:00 06/11/2019 9:00
06/11/2019 8:30 30 11-Jun bob phones 06/11/2019 8:00 06/11/2019 9:00
06/11/2019 9:00 15 11-Jun bob break 06/11/2019 9:00 06/11/2019 9:15
06/11/2019 9:00 15 11-Jun bob phones 06/11/2019 9:15 06/11/2019 11:15
06/11/2019 9:30 30 11-Jun bob phones 06/11/2019 9:15 06/11/2019 11:15
06/11/2019 10:00 30 11-Jun bob phones 06/11/2019 9:15 06/11/2019 11:15
06/11/2019 10:30 30 11-Jun bob phones 06/11/2019 9:15 06/11/2019 11:15
06/11/2019 11:00 30 11-Jun bob phones 06/11/2019 9:15 06/11/2019 11:15
06/11/2019 11:30 30 11-Jun bob phones 06/11/2019 9:15 06/11/2019 11:15
06/11/2019 12:00 15 11-Jun bob phones 06/11/2019 9:15 06/11/2019 11:15
2个新列(间隔开始和间隔中的时间)以及新添加的行,因此活动在每个间隔中都适当显示。我仍然想查看原始的活动开始和结束时间
编辑: 这是我到目前为止所拥有的。会...
到那时,我有一个表,其中包含所有用户ID的所有间隔,并且活动开始的间隔已正确标记。但是,如果活动没有按间隔开始(例如,活动从9:15而不是9:00或9:30开始),则间隔开始时间为空。并且所有中间间隔都在那里,但是为空(例如,活动从9; 00开始,一直运行到10:00 ...列出了9:30间隔,但活动信息为空
还剩下什么...
我只知道必须有更好的方法。...
import numpy as np
import pandas as pd
import datetime
from pandas import Series, DataFrame
#
# insert code to a create dataframe as df
# df has schedule data as documented elsewhere
#
#Create interval List
interval_list = list(range(0, 48))
start_of_day = datetime(2019,1,1,0,0,0)
interval_times = pd.DataFrame()
for x in interval_list:
minutes_into_day = x * 30
start_of_interval = (start_of_day + timedelta(minutes=minutes_into_day)).time()
interval_times = interval_times.append(pd.DataFrame({'interval_start': start_of_interval, 'joiner': 'joiner'}, index=[0]), ignore_index=True)
interval_times['joiner'] = 'joiner'
#Create id_date List
id_date = df.groupby(['user_id', 'date'], as_index=False)['activity'].count().copy()
id_date = id_date.drop('activity', axis=1)
id_date['joiner'] = 'joiner'
id_date
#merge Interval & ID_date
user_ids_interval_date_times = pd.merge(interval_times, id_date)
user_ids_interval_date_times['activity_start_time'] = user_ids_interval_date_times.apply(lambda r : pd.datetime.combine(r['date'],r['interval_start']),1)
#merge with df
df_temp = pd.merge(df, user_ids_interval_date_times, how='outer')
#if interval is null, then update with interval floor
def floor_dt(dt, delta):
return datetime.datetime.min + math.ceil((dt - datetime.datetime.min) / delta) * delta - delta
#Sort by ID & activity_start_time
#copy everything down
df_temp