Question

我有一个如下所示的数据框：

test = pd.DataFrame({"location": ["a", "b", "c"], "store": [1,2,3], "barcode1" : [1, 0 ,25], "barcode2" : [4,0,11], "barcode3" : [5,5,0]})

我想将条形码的值替换为“零”，如果它们小于零，则“低”，当它们小于阈值（例如“ 5”），而当其为“好”时高于该阈值。但是，我不想将其写成一个循环，因为我的实际数据帧的大小为（1415,402），这将非常耗时。

我尝试了以下代码：

test.replace(test.iloc[:,2:] <= 0 , "zero", inplace = True)

对于替换零看起来不错。但是当我想转到下一个替换对象时，如下所示：

test.replace(test.iloc[:,2:] <= 5 , "low", inplace = True)

我收到此错误“'<=”，在“ str”和“ int”的实例之间不支持”，我认为那是因为0值现在已替换为“零”。因此，我想一次进行一次替换，并且没有for循环。任何帮助将不胜感激，对于冗长的解释，我们深感抱歉。

Answer 1

将numpy.select与iloc一起使用：

m1 = test.iloc[:,2:] <= 0
m2 = test.iloc[:,2:] <= 5 

test.iloc[:,2:] = np.select([m1, m2], ['zero','low'], default='ok')           
print (test)
  location  store barcode1 barcode2 barcode3
0        a      1      low      low      low
1        b      2     zero     zero      low
2        c      3       ok       ok     zero

编辑：

def a(test):
    test.iloc[:, 2:] = np.select([test.iloc[:,2:] <= 0, 
                                  test.iloc[:,2:] <= 5 ], ['zero','low'], default='ok')
    return test

def c(test):
    arr1 = test.values[:,2:]
    new = np.full(arr1.shape, 'ok', dtype=object)
    new[arr1 <= 5] = 'low'
    new[arr1 <= 0] = 'zero'
    return test.iloc[:,:2].join(pd.DataFrame(new,columns=test.columns[2:],index=test.index))

print (a(test.copy()))
print (c(test.copy()))

In [91]: %timeit (a(test.copy()))
36.6 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [92]: %timeit (c(test.copy()))
26.9 ms ± 180 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Answer 2

您可以使用cut函数-

import numpy as np
test["barcode1"] = pd.cut(test["barcode1"], [-np.inf, 0, 5, np.inf], labels=["zero", "low", "ok"])

  location  store barcode1  barcode2  barcode3
0        a      1      low         4         5
1        b      2     zero         0         5
2        c      3       ok       11         0

根据不同条件替换值

2 个答案: