Question

我有一个DataFrame，其值可能会或可能不会被标准化。

+---+------------+
| x | normalized |
+---+------------+
| 1 | True       |
+---+------------+
| 2 | True       |
+---+------------+
| 3 | False      |
+---+------------+
| 4 | True       |
+---+------------+
| 5 | False      |
+---+------------+

我目前对所有x进行标准化的方式是

df.x = df.x.where(df.normalized, normalize)

normalize是一个函数，可能会花费大量时间，所以我想知道是否normalize的每个值都被调用x的{{1}}是否为True 。我怀疑是基于normalized。

如果是，是pandas/core/generic.py还是使用apply(lambda ...)的更好方法？

Answer 1

使用numpy.where更快，因为pandas是在numpy数组中构建的：

normalize = 10
df.x = np.where(df.normalized.values, df.x.values, normalize)
print (df)
    x  normalized
0   1        True
1   2        True
2  10       False
3   4        True
4  10       False

是的，因此最好使用此方法-通过布尔蒙版列过滤两侧：

def normalize(x):
    return x + 10

df.loc[df.normalized, 'x'] = df.loc[df.normalized, 'x'].apply(normalize)
print (df)
    x  normalized
0  11        True
1  12        True
2   3       False
3  14        True
4   5       False

在熊猫中，.where（cond，other）是否计算_other_而不考虑_cond_？

1 个答案: