我有一个如下所示的熊猫数据框,
flag a b c
0 1 5 1 3
1 1 2 1 3
2 1 3 0 3
3 1 4 0 3
4 1 5 5 3
5 1 6 0 3
6 1 7 0 3
7 2 6 1 4
8 2 2 1 4
9 2 3 1 4
10 2 4 1 4
我要根据以下条件创建列“ d”:
1)对于每个标志的第一行,如果a> c,则d = b,否则d = nan
2)对于每个标志的非第一行,如果(a> c)&((d的上一行是nan)|(b> d的上一行)),d = b,否则d =上一行的
我期望以下输出:
flag a b c d
0 1 5 1 3 1
1 1 2 1 3 1
2 1 3 0 3 1
3 1 4 0 3 1
4 1 5 5 3 5
5 1 6 0 3 5
6 1 7 0 3 5
7 2 6 1 4 1
8 2 2 1 4 1
9 2 3 1 4 1
10 2 4 1 4 1
答案 0 :(得分:0)
这是我如何翻译您的逻辑:
df['d'] = np.nan
# first row of flag
s = df.flag.ne(df.flag.shift())
# where a > c
a_gt_c = df['a'].gt(df['c'])
# fill the first rows with a > c
df.loc[s & a_gt_c, 'd'] = df['b']
# mask for second fill
mask = ((~s) # not first rows
& a_gt_c # a > c
& (df['d'].shift().isna() # previous d not null
| df['b'].gt(df['d']).shift()) # or b > previous d
)
# fill those values:
df.loc[mask, 'd'] = df['b']
# ffill for the rest
df['d'] = df['d'].ffill()
输出:
flag a b c d
0 1 5 1 3 1.0
1 1 2 1 3 1.0
2 1 3 0 3 1.0
3 1 4 0 3 0.0
4 1 5 5 3 5.0
5 1 6 0 3 0.0
6 1 7 0 3 0.0
7 2 6 1 4 1.0
8 2 2 1 4 1.0
9 2 3 1 4 1.0
10 2 4 1 4 1.0