Question

我有一个包含5列的数据框，所有这些列都包含数值。列代表时间步长。我有一个阈值，如果在一定时间内达到该阈值，则将阻止更改值。因此，假设原始值是[0，1.5，2，4，1]排列成一行，阈值是2，那么我希望操作的行值是[0，1，2，2，2] 有没有办法做到这一点而没有循环？

一个更大的例子：

>>> threshold = 0.25

>>> input
Out[75]: 
      0    1    2    3    4   
130  0.10 0.20 0.12 0.25 0.20
143  0.11 0.27 0.12 0.28 0.35
146  0.30 0.20 0.12 0.25 0.20
324  0.06 0.20 0.12 0.15 0.20

>>> output
Out[75]: 
      0    1    2    3    4   
130  0.10 0.20 0.12 0.25 0.25
143  0.11 0.27 0.27 0.27 0.27
146  0.30 0.30 0.30 0.30 0.30
324  0.06 0.20 0.12 0.15 0.20

Answer 1

使用：

decltype

说明：

按阈值比较df = df.where(df.ge(threshold).cumsum(axis=1).cumsum(axis=1).eq(1)).ffill(axis=1).fillna(df) print (df) 0 1 2 3 4 130 0.10 0.20 0.12 0.25 0.25 143 0.11 0.27 0.27 0.27 0.27 146 0.30 0.30 0.30 0.30 0.30 324 0.06 0.20 0.12 0.15 0.20（ge）：

>=

创建每行的累积总和：

print (df.ge(threshold))
         0      1      2      3      4
130  False  False  False   True  False
143  False   True  False   True   True
146   True  False  False   True  False
324  False  False  False  False  False

再次获得第一个匹配值：

print (df.ge(threshold).cumsum(axis=1))
     0  1  2  3  4
130  0  0  0  1  1
143  0  1  1  2  3
146  1  1  1  2  2
324  0  0  0  0  0

按print (df.ge(threshold).cumsum(axis=1).cumsum(axis=1)) 0 1 2 3 4 130 0 0 0 1 2 143 0 1 2 4 7 146 1 2 3 5 7 324 0 0 0 0 0比较：

替换为print (df.ge(threshold).cumsum(axis=1).cumsum(axis=1).eq(1)) 0 1 2 3 4 130 False False False True False 143 False True False False False 146 True False False False False 324 False False False False False个不匹配值的

NaN

向前填写缺失值：

print (df.where(df.ge(threshold).cumsum(axis=1).cumsum(axis=1).eq(1)))
       0     1   2     3   4
130  NaN   NaN NaN  0.25 NaN
143  NaN  0.27 NaN   NaN NaN
146  0.3   NaN NaN   NaN NaN
324  NaN   NaN NaN   NaN NaN

将第一个值替换为原始值

print (df.where(df.ge(threshold).cumsum(axis=1).cumsum(axis=1).eq(1)).ffill(axis=1))

       0     1     2     3     4
130  NaN   NaN   NaN  0.25  0.25
143  NaN  0.27  0.27  0.27  0.27
146  0.3  0.30  0.30  0.30  0.30
324  NaN   NaN   NaN   NaN   NaN

Answer 2

有点复杂，但是我喜欢。

NodeJS

我也喜欢这个：

v = df.values
a = v >= threshold

b = np.where(np.logical_or.accumulate(a, axis=1), np.nan, v)

r = np.arange(len(a))
j = a.argmax(axis=1)
b[r, j] = v[r, j]

pd.DataFrame(b, df.index, df.columns).ffill(axis=1)

        0     1     2     3     4
130  0.10  0.20  0.12  0.25  0.25
143  0.11  0.27  0.27  0.27  0.27
146  0.30  0.30  0.30  0.30  0.30
324  0.06  0.20  0.12  0.15  0.20

在熊猫数据框中的阈值处截断值

2 个答案: