假设我有这样的数据
id Date Time_Start Time_End start stop split
0 011 2017-08-01 20:20 21:40 2017-08-01 20:20:00 2017-08-01 21:40:00 False
1 012 2017-08-01 17:15 19:12 2017-08-01 17:15:00 2017-08-01 19:12:00 True
2 013 2017-08-01 15:46 16:20 2017-08-01 15:46:00 2017-08-01 16:20:00 False
,切割时间是每天的18:00。因此,例如,“ 012”应分为两行 并且第一行的停止列应更新为2017-08-01 17:59:00,而第二行的开始应为2017-08-01 18:00:00,其余的保持不变
id Date Time_Start Time_End start stop split birth_date
0 011 2017-08-01 20:20 21:40 2017-08-01 20:20:00 2017-08-01 21:40:00 False 2017-08-01
1 012 2017-08-01 17:15 19:12 2017-08-01 17:15:00 2017-08-01 17:59:00 True 2017-08-01
1 012 2017-08-01 17:15 19:12 2017-08-01 18:00:00 2017-08-01 19:12:00 True 2017-08-02
2 013 2017-08-01 15:46 16:20 2017-08-01 15:46:00 2017-08-01 16:20:00 False 2017-08-01
请注意,如果“出生日期”的18:00之前的停止时间与“日期”是同一天,那么我也想在结尾处创建一个新的列调用“出生日期”,但是如果在剪切之后是“出生日期” ”将是第二天。
下面是到目前为止我一直在使用的代码,我被困在询问的地方。因此,将不胜感激。
import pandas as pd
from datetime import datetime, time
def make_date_time(df):
df["start"] = pd.to_datetime(df["Date"].apply(str) + " " + df["Time_Start"])
df["stop"] = pd.to_datetime(df["Date"].apply(str) + " " + df["Time_End"])
def check_date_time(df):
if df["start"] > df["stop"]:
df["stop"] += pd.Timedelta(days=1)
return df["stop"]
df["stop"] = df.apply(check_date_time, axis=1)
return df
def in_cut(df):
reference = df["start"].replace(hour=18, minute=0, second=0)
if df["start"] <= df["stop"]:
return df["start"] <= reference < df["stop"]
else:
return df["start"] <= reference or reference < df["stop"]
data = {"id":["011","012","013"], "Date": ["2017-08-01", "2017-08-01", "2017-08-01"], "Time_Start":["20:20", "17:15", "15:46"], "Time_End":["21:40", "19:12", "16:20"]}
df = pd.DataFrame.from_dict(data)
df = make_date_time(df)
df["split"] = False
df["stop"] = df.apply(check_date_time, axis=1)
df["split"] = df.apply(in_cut, axis=1)
df
答案 0 :(得分:0)
据我对问题的理解,无论分割为True,您都希望更新停止和开始时间。 下面是我的方法(我想它需要一些改进.. :))。希望这会有所帮助。
import datetime
next_df = pd.DataFrame(columns=list(df.columns))
next_df['BirthDate'] = ''
pos_new = 0
pos_old = 0
for i in range(len(df)):
if df['split'][i]:
temp = list(df.iloc[i])
print(temp)
df['stop'][i] = (df['stop'][i]).replace(hour=17, minute=59, second=0)
temp_list = list(df.loc[pos_old])
temp_list.append(df.loc[pos_old][0])
next_df.loc[pos_new] = temp_list
pos_old+=1
pos_new+=1
temp[4] = temp[4].replace(hour=18, minute=0, second=0)
print(temp)
#conversion of date
temp_date = datetime.datetime.strptime(temp[0], "%Y-%m-%d")
temp.append((temp_date + datetime.timedelta(days=1)).strftime("%Y-%m-%d"))
next_df.loc[pos_new] = temp
pos_new+=1
else:
temp_list = list(df.loc[pos_old])
temp_list.append(df.loc[pos_old][0])
next_df.loc[pos_new] = temp_list
pos_old+=1
pos_new+=1
修改
我必须对您的代码进行一些更改才能成功生成我想要的内容。作为此解决方案,尽管它不是pythonic,但仍可以实现我想要的功能。因此,我会接受它作为正确答案。
以下是我更新的代码。
import datetime
next_df = pd.DataFrame(columns=list(df.columns))
next_df['BirthDate'] = ''
pos_new = 0
pos_old = 0
for i in range(len(df)):
if df['split'][i]:
# get one row at a time and converse it to a list
temp = list(df.iloc[i])
# update stop time to 17:59:00
df['stop'][i] = (df['stop'][i]).replace(hour=17, minute=59, second=0)
temp_list = list(df.loc[pos_old])
# append birth adte to the list
temp_list.append(temp[4].date().strftime("%Y-%m-%d"))
# add this row to new df
next_df.loc[pos_new] = temp_list
# update the pointers for old and new df
pos_old+=1
pos_new+=1
temp[4] = temp[4].replace(hour=18, minute=0, second=0)
# conversion of date
temp_date = temp[4].date()
# plus one to date as this case is considered to happen in the following day
temp.append((temp_date + datetime.timedelta(days=1)).strftime("%Y-%m-%d"))
# add the new row to df
next_df.loc[pos_new] = temp
# update the pointer of new df for the split row
pos_new+=1
else:
temp_list = list(df.loc[pos_old])
temp_list.append(temp[4].date().strftime("%Y-%m-%d"))
next_df.loc[pos_new] = temp_list
# update the pointers for old and new df
pos_old+=1
pos_new+=1
next_df