在pandas中按DateTime拆分DataFrame行

时间:2017-10-11 20:06:02

标签: python pandas dataframe time-series

我有一个包含这样的事件的DataFrame:

location  start_time   end_time     some_value1   some_value2
LECP      00:00        01:30        25            nice info
LECP      02:00        04:00        10            other info
LECS      02:00        03:00         5            lorem
LIPM      02:55        03:15         9            ipsum

我想分割行,以便获得1 hour的最大间隔,例如如果某个活动的有效期为01:30,我希望获得一行01:00和另一个00:30。如果事件的长度为02:30,我想获得三行。如果一个事件的持续时间为一小时或更短,它应该只是一行。像这样:

location  start_time   end_time   some_value1   some_value2
LECP      00:00        01:00      25            nice info
LECP      01:00        01:30      25            nice info

LECP      02:00        03:00      10            other info
LECP      03:00        04:00      10            other info

LECS      02:00        03:00       5            lorem
LIPM      02:55        03:15       9            ipsum

余数是在开头还是结尾都没关系。如果持续时间平均分配给行,则无关紧要,只要没有行具有> 1的持续时间即可。 1小时。

我尝试了什么:    - 阅读Time Series / Date functionality而不理解任何事情    - 搜索StackOverflow。

1 个答案:

答案 0 :(得分:0)

我调整了this答案来实现每小时而不是每天分割。此代码在WHIL循环中工作,因此只要存在持续时间仍然> 1的行,它就会重新迭代。 1小时。

mytimedelta = pd.Timedelta('1 hour')

#create boolean mask
split_rows = (dfob['duration'] > mytimedelta)    

while split_rows.any():
    #get new rows to append and adjust start time to 1 hour later.
    new_rows = dfob[split_rows].copy()
    new_rows['start'] = new_rows['start'] + mytimedelta

    #update the end time of old rows
    dfob.loc[split_rows, 'end'] = dfob.loc[split_rows, 'start'] + \
        pd.DateOffset(hours=1, seconds=-1)
    dfob = dfob.append(new_rows)

    #update the duration of all rows
    dfob['duration'] = dfob['end'] - dfob['start']

    #create an updated boolean mask
    split_rows = (dfob['duration'] > mytimedelta)

#when job is done:
dfob.sort_index().reset_index(drop=True)
dfob['duration'] = dfob['end'] - dfob['start']