我根据日期有两个数据框,例如:df1
id date time sum
abc 15/03/2020 01:00:00 15
abc 15/03/2020 02:00:00 25
abc 15/03/2020 04:00:00 10
abc 15/03/2020 04:30:00 5
abc 15/03/2020 05:00:00 20
xyz 15/03/2020 12:00:00 3
xyz 15/03/2020 03:00:00 20
xyz 15/03/2020 04:00:00 20
xyz 15/03/2020 05:00:00 50
df2 是
id date sum_last high_last low_last
abc 14/03/2020 10 10 5
xyz 14/03/2020 5 9 7
我想通过比较 sum 列的值来在 df1 中创建 Flag 列,如果 sum 行的值大于前一个 sum 行,则 flag 为 1,否则为 0 但对于 sum 列的第一行值为 15它不会是 Nan,它将与 df2 总和值的值进行比较,因为它对于一个较小的日期(即 2020 年 3 月 14 日)具有相同的 ID。高列逻辑是如果标志为 1,那么它将采用相邻的总和列值作为高值即 15 为 15>10(10 是 abc 的 sum_last 值),因此该 id 的标志为 1,低将是其前一行值,即该 id 的 sum_last 值,即 10。如果标志为 0,则它采用前一个行的高低值如 xyz,14/03/2021 sum_last 是 5 和 15/03/2021 是 3 和 3<5 所以标志是 0 并且相邻的高和低值将与前一行相同,即 9 和 7 .所以输出将是:
id date time sum Flag high low
abc 15/03/2020 01:00:00 15 1 15 10 #flag is 1 as 10(sum_last)<15(sum) and high is now 15 and low is 10(previous value i.e sum_last column value)
abc 15/03/2020 02:00:00 25 1 25 15 #high changed coz flag is 1,so does low
abc 15/03/2020 04:00:00 10 0 25 15 #high remains unchanged coz 0 so no change for low value
abc 15/03/2020 04:30:00 5 0 25 15 #flag=0 so no change in high and low
abc 15/03/2020 05:00:00 20 1 20 10 #flag=1 high changed and so does low
xyz 15/03/2020 12:00:00 3 0 9 7 #id is changed and acc to that flag is 0 as 5>3 high will not change and remain 9 and low will also not change
xyz 15/03/2020 03:00:00 20 1 20 3 #flag=1 high = sum value low=previous sum value
xyz 15/03/2020 04:00:00 20 0 20 3 #flag=0 high and low will same as previous row
xyz 15/03/2020 05:00:00 50 1 50 20 #flag=1 high=50 and low=20
我正在使用标记列的代码,如下所示:
cols = ['sum']
new = [x + '_last' for x in cols]
d = dict(zip(new, cols))
print (d)
#set id to index
df1 = df1.set_index('id')
df2 = df2.set_index('id')
#shifting per id and first NaN repalced by df2
df = df1.groupby('id')[cols].shift().fillna(df2.rename(columns=d)[cols])
print (df)
df1 = pd.concat([df1, df1[cols].gt(df[cols]).astype(int).add_prefix('flag_')],axis=1)
print (df1)
它给了我 Flag 列,但我无法制作高低列。有人可以在这里帮助我。提前致谢