我有以下数据框 df1
id date_col No. of leaves
100 2018-10-05 4
100 2018-10-14 4
100 2018-10-19 4
100 2018-11-15 4
101 2018-10-05 3
101 2018-10-08 3
101 2018-12-05 3
df2
id date_col leaves_availed
100 2018-11-28 2
100 2018-11-29 2
101 2018-11-19 2
101 2018-11-24 2
我想要df1中具有特定ID和日期的行比df2中具有特定ID的日期小的行,然后删除日期最早的行,并且 从“树叶数”中减去leaves_availed的数量。
在上面的示例中,结果数据帧应为
id date_col No. of leaves
100 2018-10-19 2
100 2018-11-15 2
101 2018-12-05 1
对于df2中id = 100和日期为2018-11-28的日期小于2018-11-28的行为
id date_col No. of leaves
100 2018-10-05 4
100 2018-10-14 4
100 2018-10-19 4
100 2018-11-15 4
,该子集中最早的日期是2018-10-05
因此,行100 2018-10-05 4
将被删除,依此类推
现在,我已经对两个数据框进行了排序
df1.sort_values(by=['id','date_col'],inplace=True)
df2.sort_values(by=['id','date_col'],inplace=True)
并且iam尝试根据df2中的行数删除df1中的前几行,但这无济于事
答案 0 :(得分:0)
遵循逻辑,但不测试所有异常
import pandas as pd
def process(row):
return row['No. of leaves'] - df2.iloc[0]['leaves_availed']
#recreate the different dataframe"
id1 = pd.DataFrame({'id': [100, 100, 100, 100, 101, 101, 101]})
il1 = pd.DataFrame({'No. of leaves': [4, 4, 4, 4, 3, 3, 3]})
id2 = pd.DataFrame({'id': [100, 100, 101, 101]})
il2 = pd.DataFrame({'leaves_availed': [2, 2, 2, 2]})
df1 = pd.DataFrame({'year': [2018, 2018, 2018, 2018, 2018, 2018, 2018],
'month': [10, 10, 10, 11, 10, 10, 12],
'day': [5, 14, 19, 15, 5, 8, 5]})
df2 = pd.DataFrame({'year': [2018, 2018, 2018, 2018],
'month': [11, 11, 11, 11],
'day': [28, 29, 19, 24]})
df1 = pd.Series(pd.to_datetime(df1, format='%Y-%m-%d')).to_frame()
df1.columns = ["date_col"]
df1 = pd.concat([id1, df1, il1], axis=1)
df2 = pd.Series(pd.to_datetime(df2, format='%Y-%m-%d')).to_frame()
df2.columns = ["date_col"]
df2 = pd.concat([id2, df2, il2], axis=1)
df1.sort_values(by=['id','date_col'],inplace=True)
df2.sort_values(by=['id','date_col'],inplace=True)
#end of creation dafaframes
#loop each row of df2
for i in range(0, len(df2)):
#filtering the df
df3 = df1[(df1["date_col"] < df2.iloc[i]["date_col"]) & (df1['id'] == df2.iloc[i]['id']) ]
df3 = df3.iloc[1:] #delete the oldest
df3['No. of leaves'] = df3.apply(lambda row: process(row), axis = 1) #calculus the new leaves
print(F"result for date {df2.iloc[i]['date_col']} and id = {df2.iloc[i]['id']}")
print(df3);print('-----------------\n')
显示的最终结果
result for date 2018-11-28 00:00:00 and id = 100
id date_col No. of leaves
1 100 2018-10-14 2
2 100 2018-10-19 2
3 100 2018-11-15 2
-----------------
result for date 2018-11-29 00:00:00 and id = 100
id date_col No. of leaves
1 100 2018-10-14 2
2 100 2018-10-19 2
3 100 2018-11-15 2
-----------------
result for date 2018-11-19 00:00:00 and id = 101
id date_col No. of leaves
5 101 2018-10-08 1
-----------------
result for date 2018-11-24 00:00:00 and id = 101
id date_col No. of leaves
5 101 2018-10-08 1
-----------------