Question

我在第一列和第二列中都有一个数字编号的数据框：

d = {'col1': [1,2,3,4,5,6,7,8,9,10], 'col2': [1,2,1,16,8,7,8.1,11.1,12.0,13.1]}

df = pd.DataFrame(data=d)

实际上，我有一个excel文件，其中包含所有我无法共享的数据，但是想法保持不变。

我想在第三列中说明第一列与第二列之间的区别是什么

conditions = [
    (df["col1"] > df["col2"]),
    (df["col1"] == df["col2"]),
    (df["col2"] - df["col1"] <= 1),
    (df["col2"] - df["col1"] <= 3),
    (df["col2"] - df["col1"] > 3.01),
    ]
choices = ['1less', '2equal', '3more up to 1', '4more up to 3', '5more above 3']

df['diff type'] = np.select(conditions, choices, default='6some_default_value')

所以，当我有小的数据集时，它可以工作。但是随着我的excel中有数百万行，我有时会遇到这种情况

col1.value = 2,99，

col2.value = 3,99，

所以diff=1，但是diff type设置为4more up to 3。

从数学上讲是正确的，但在第三个条件出现后不应该停止吗？有没有办法以不同的方式实现相同的结果？

Python np.select无法满足条件

0 个答案: