我想知道是否有一种方法可以根据一些条件将某些列值拆分为单独的行。 这是我的数据集的示例:
StartDateTime EndDateTime HoursInBlock TotalHours Type EmployeeID
2020-07-31 06:30:00 2020-07-31 07:00:00 0.5 0.5 A 1282
2020-07-31 07:00:00 2020-07-31 08:00:00 1 1.5 A 1282
2020-07-31 08:00:00 2020-07-31 09:00:00 1 2.5 B 1282
2020-07-31 09:00:00 2020-07-31 10:00:00 1 3.5 C 1282
2020-07-31 10:00:00 2020-07-31 11:00:00 1 4.5 A 1282
2020-07-31 11:00:00 2020-07-31 12:00:00 1 5.5 A 1282
2020-07-31 12:00:00 2020-07-31 13:00:00 1 6.5 B 1282
2020-07-31 13:00:00 2020-07-31 14:00:00 1 7.5 C 1282
2020-07-31 14:00:00 2020-07-31 15:00:00 1 8.5 B 1282
2020-07-31 15:00:00 2020-07-31 15:30:00 0.5 9 D 1282
我要在这里做的是,一旦TotalHours达到“ 8小时”,就将TotalHours列拆分为单独的行,以便每个员工都有一个TotalHours = 8的块。如果某人工作超过8小时,则输入它们将继续保持原样。例如,我想要这样的东西作为结果表
StartDateTime EndDateTime HoursInBlock TotalHours Type EmployeeID
2020-07-31 06:30:00 2020-07-31 07:00:00 0.5 0.5 A 1282
2020-07-31 07:00:00 2020-07-31 08:00:00 1 1.5 A 1282
2020-07-31 08:00:00 2020-07-31 09:00:00 1 2.5 B 1282
2020-07-31 09:00:00 2020-07-31 10:00:00 1 3.5 C 1282
2020-07-31 10:00:00 2020-07-31 11:00:00 1 4.5 A 1282
2020-07-31 11:00:00 2020-07-31 12:00:00 1 5.5 A 1282
2020-07-31 12:00:00 2020-07-31 13:00:00 1 6.5 B 1282
2020-07-31 13:00:00 2020-07-31 14:00:00 1 7.5 C 1282
2020-07-31 14:00:00 2020-07-31 14:30:00 0.5 8.0 B 1282 *
2020-07-31 14:30:00 2020-07-31 15:00:00 0.5 8.5 B 1282 **
2020-07-31 15:00:00 2020-07-31 15:30:00 0.5 9 D 1282
您可以看到*行已修改,并且**行已插入。其他变量(包括StartDateTime,EndDateTime和HoursInBlock)也进行了相应调整。这是我目前拥有的东西,但似乎并没有按照我的意图插入新行。我知道这不容易阅读,但是如果有人可以帮忙,我将不胜感激。随时编辑我的代码,或者如果您知道更好的方法,请告诉我。
for i in range(len(df)):
#This is a part where I update a value in * row.
if df.loc[i, "TotalHours"] > 8 and df.loc[i, "TotalHours"]<9 :
a = df.loc[i, "TotalHours"] - 8
df.loc[i, "TotalHours"] = 8
df.loc[i, "HoursInBlock"] = df.loc[i, "HoursInBlock"] - a
b = df.loc[i, "HoursInBlock"]
df.loc[i, 'EndDateTime'] = pd.to_datetime(df.loc[i,'EndDateTime']) - timedelta(minutes = b*60)
#Now I'm trying to insert **row
newline = pd.DataFrame({"StartDateTime":None, "EndDateTime":None,
"HoursInBlock":None, "TotalHours": None, "Type": None, "EmployeeID": None}
,index =[i+1])
df = pd.concat([df.iloc[:i+1], newline, df.iloc[i+2:]]).reset_index(drop=True, inplace = True)
df.loc[i+1, "StartDateTime"] = df.loc[i, "EndDateTime"]
df.loc[i+1, "EndDateTime"] = df.loc[i+1, "StartDateTime"] + timedelta(minutes = b*60)
df.loc[i+1, "HoursInBlock"] = df.loc[i+1, "StartDateTime"] - df.loc[i+1, "EndDateTime"]
df.loc[i+1, "acc_hour"] = df.loc[i, "acc_hour"] - 8
df.loc[i+1, "Type"] = df.loc[i, "Type"]
df.loc[i+1, "EmployeeID"] = df.loc[i, "EmployeeID"]
谢谢。