如何在特定条件下将Pandas数据框的一行拆分为两行

时间:2018-08-02 09:54:09

标签: python pandas datetime dataframe

假设我有这样的数据

    id      Date        Time_Start  Time_End          start              stop           split

0   011     2017-08-01  20:20   21:40   2017-08-01 20:20:00     2017-08-01 21:40:00     False
1   012     2017-08-01  17:15   19:12   2017-08-01 17:15:00     2017-08-01 19:12:00     True
2   013     2017-08-01  15:46   16:20   2017-08-01 15:46:00     2017-08-01 16:20:00     False

,切割时间是每天的18:00。因此,例如,“ 012”应分为两行 并且第一行的停止列应更新为2017-08-01 17:59:00,而第二行的开始应为2017-08-01 18:00:00,其余的保持不变

    id      Date        Time_Start  Time_End          start              stop              split    birth_date
    0   011     2017-08-01  20:20   21:40   2017-08-01 20:20:00     2017-08-01 21:40:00     False   2017-08-01
    1   012     2017-08-01  17:15   19:12   2017-08-01 17:15:00     2017-08-01 17:59:00     True    2017-08-01
    1   012     2017-08-01  17:15   19:12   2017-08-01 18:00:00     2017-08-01 19:12:00     True    2017-08-02
    2   013     2017-08-01  15:46   16:20   2017-08-01 15:46:00     2017-08-01 16:20:00     False   2017-08-01

请注意,如果“出生日期”的18:00之前的停止时间与“日期”是同一天,那么我也想在结尾处创建一个新的列调用“出生日期”,但是如果在剪切之后是“出生日期” ”将是第二天。

下面是到目前为止我一直在使用的代码,我被困在询问的地方。因此,将不胜感激。

import pandas as pd    
from datetime import datetime, time

def make_date_time(df):
    df["start"] = pd.to_datetime(df["Date"].apply(str) + " " + df["Time_Start"])
    df["stop"] = pd.to_datetime(df["Date"].apply(str) + " " + df["Time_End"])
    def check_date_time(df):
        if df["start"] > df["stop"]:
            df["stop"] += pd.Timedelta(days=1)
        return df["stop"]
    df["stop"] = df.apply(check_date_time, axis=1)
    return df

def in_cut(df):
    reference = df["start"].replace(hour=18, minute=0, second=0)
    if df["start"] <= df["stop"]:
        return df["start"] <= reference < df["stop"]
    else:
        return df["start"] <= reference or reference < df["stop"]



data = {"id":["011","012","013"], "Date": ["2017-08-01", "2017-08-01", "2017-08-01"], "Time_Start":["20:20", "17:15", "15:46"], "Time_End":["21:40", "19:12", "16:20"]}
df = pd.DataFrame.from_dict(data)
df = make_date_time(df)
df["split"] = False
df["stop"] = df.apply(check_date_time, axis=1)
df["split"] = df.apply(in_cut, axis=1)
df

1 个答案:

答案 0 :(得分:0)

据我对问题的理解,无论分割为True,您都希望更新停止和开始时间。 下面是我的方法(我想它需要一些改进.. :))。希望这会有所帮助。

import datetime

next_df = pd.DataFrame(columns=list(df.columns))
next_df['BirthDate'] = ''
pos_new = 0
pos_old = 0
for i in range(len(df)):
    if df['split'][i]:
        temp = list(df.iloc[i])
        print(temp)
        df['stop'][i] = (df['stop'][i]).replace(hour=17, minute=59, second=0)

        temp_list = list(df.loc[pos_old])
        temp_list.append(df.loc[pos_old][0])

        next_df.loc[pos_new] = temp_list
        pos_old+=1
        pos_new+=1

        temp[4] = temp[4].replace(hour=18, minute=0, second=0)
        print(temp)
        #conversion of date
        temp_date = datetime.datetime.strptime(temp[0], "%Y-%m-%d")
        temp.append((temp_date + datetime.timedelta(days=1)).strftime("%Y-%m-%d"))
        next_df.loc[pos_new] = temp

        pos_new+=1
    else:
        temp_list = list(df.loc[pos_old])
        temp_list.append(df.loc[pos_old][0])
        next_df.loc[pos_new] = temp_list
        pos_old+=1
        pos_new+=1

修改

我必须对您的代码进行一些更改才能成功生成我想要的内容。作为此解决方案,尽管它不是pythonic,但仍可以实现我想要的功能。因此,我会接受它作为正确答案。

以下是我更新的代码。

import datetime

next_df = pd.DataFrame(columns=list(df.columns))
next_df['BirthDate'] = ''
pos_new = 0
pos_old = 0
for i in range(len(df)):
    if df['split'][i]:
        # get one row at a time and converse it to a list
        temp = list(df.iloc[i])

        # update stop time to 17:59:00
        df['stop'][i] = (df['stop'][i]).replace(hour=17, minute=59, second=0)

        temp_list = list(df.loc[pos_old])
        # append birth adte to the list
        temp_list.append(temp[4].date().strftime("%Y-%m-%d"))
        # add this row to new df
        next_df.loc[pos_new] = temp_list
        # update the pointers for old and new df 
        pos_old+=1
        pos_new+=1
        temp[4] = temp[4].replace(hour=18, minute=0, second=0)

        # conversion of date
        temp_date = temp[4].date() 
        # plus one to date as this case is considered to happen in the following day
        temp.append((temp_date + datetime.timedelta(days=1)).strftime("%Y-%m-%d"))
        # add the new row to df
        next_df.loc[pos_new] = temp
        # update the pointer of new df for the split row
        pos_new+=1
    else:
        temp_list = list(df.loc[pos_old])
        temp_list.append(temp[4].date().strftime("%Y-%m-%d"))
        next_df.loc[pos_new] = temp_list
        # update the pointers for old and new df 
        pos_old+=1
        pos_new+=1

next_df