我正在尝试实现以下目标:开始应为零,直到R列低于20,然后为正数,直到R列高于80,并且该循环应重复(重置)。直到第19行,该行为都是预期的,但在第20行中,尽管标准不匹配,但Start仍被莫名其妙地设置为1。添加额外的列就可以了。
df = pd.DataFrame(np.random.randint(0, 100, size=100), columns=['R'])
df['Start'] = np.where((df.R < 20), 1, 0)
df['End'] = np.where((df.R > 80), 1, 0)
df.loc[df['End'].shift().eq(0), 'Start'] = df['Start'].replace(0, np.nan).ffill().fillna(0).astype(int)
R Start End
11 82 0 1
12 63 0 0
13 37 0 0
14 21 0 0
15 88 0 1
16 9 1 0
17 13 1 0
18 83 1 1
19 47 0 0
20 68 1 0
21 42 1 0
22 67 1 0
23 26 1 0
24 79 1 0
25 87 1 1
26 96 0 1
27 39 0 0
28 50 1 0
29 94 1 1
30 95 0 1
解决方案,根据Quang Hoang的回答:
df = pd.DataFrame(np.random.randint(0, 100, size=100), columns=['R'])
df['Start'] = np.select([df['R'] < 20, df['R'] > 80], (1,0), np.nan)
df['Start'] = df['Start'].ffill()
df['Start'] = df.Start.combine(pd.Series(np.insert(abs(np.diff(df.Start)), 0, 0)), max, fill_value=0)
答案 0 :(得分:2)
IIUC,您可以使用np.select
:
df['Start'] = np.select([df['R']>80, df['R']<20], (1,0), np.nan)
df['Start'] = df['Start'].ffill()
输出:
R Start End
11 82 1.0 1
12 63 1.0 0
13 37 1.0 0
14 21 1.0 0
15 88 1.0 1
16 9 0.0 0
17 13 0.0 0
18 83 1.0 1
19 47 1.0 0
20 68 1.0 0
21 42 1.0 0
22 67 1.0 0
23 26 1.0 0
24 79 1.0 0
25 87 1.0 1
26 96 1.0 1
27 39 1.0 0
28 50 1.0 0
29 94 1.0 1
30 95 1.0 1