熊猫有条件填充

时间:2019-11-08 14:17:45

标签: python pandas

我正在尝试实现以下目标:开始应为零,直到R列低于20,然后为正数,直到R列高于80,并且该循环应重复(重置)。直到第19行,该行为都是预期的,但在第20行中,尽管标准不匹配,但Start仍被莫名其妙地设置为1。添加额外的列就可以了。

df = pd.DataFrame(np.random.randint(0, 100, size=100), columns=['R'])
df['Start'] = np.where((df.R < 20), 1, 0)
df['End'] = np.where((df.R > 80), 1, 0)
df.loc[df['End'].shift().eq(0), 'Start'] = df['Start'].replace(0, np.nan).ffill().fillna(0).astype(int)
     R  Start  End
11  82      0    1
12  63      0    0
13  37      0    0
14  21      0    0
15  88      0    1
16   9      1    0
17  13      1    0
18  83      1    1
19  47      0    0
20  68      1    0
21  42      1    0
22  67      1    0
23  26      1    0
24  79      1    0
25  87      1    1
26  96      0    1
27  39      0    0
28  50      1    0
29  94      1    1
30  95      0    1

解决方案,根据Quang Hoang的回答:

df = pd.DataFrame(np.random.randint(0, 100, size=100), columns=['R'])
df['Start'] = np.select([df['R'] < 20, df['R'] > 80], (1,0), np.nan)
df['Start'] = df['Start'].ffill()
df['Start'] = df.Start.combine(pd.Series(np.insert(abs(np.diff(df.Start)), 0, 0)), max, fill_value=0)

1 个答案:

答案 0 :(得分:2)

IIUC,您可以使用np.select

df['Start'] = np.select([df['R']>80, df['R']<20], (1,0), np.nan)
df['Start'] = df['Start'].ffill()

输出:

     R  Start  End
11  82    1.0    1
12  63    1.0    0
13  37    1.0    0
14  21    1.0    0
15  88    1.0    1
16   9    0.0    0
17  13    0.0    0
18  83    1.0    1
19  47    1.0    0
20  68    1.0    0
21  42    1.0    0
22  67    1.0    0
23  26    1.0    0
24  79    1.0    0
25  87    1.0    1
26  96    1.0    1
27  39    1.0    0
28  50    1.0    0
29  94    1.0    1
30  95    1.0    1