Question

我试图实现一个在数据帧或系列的每个位置返回最大值的函数，最小化NaN。

In [217]: a
Out[217]: 
   0  1
0  4  1
1  6  0

[2 rows x 2 columns]

In [218]: b
Out[218]: 
    0   1
0 NaN   3
1   3 NaN

[2 rows x 2 columns]


In [219]: do_not_replace = b.isnull() | (a > b)

In [220]: do_not_replace
Out[220]: 
      0      1
0  True  False
1  True   True

[2 rows x 2 columns]


In [221]: a.where(do_not_replace, b)
Out[221]: 
   0  1
0  4  3
1  1  0

[2 rows x 2 columns]


In [222]: expected
Out[222]: 
   0  1
0  4  3
1  6  0

[2 rows x 2 columns]

In [223]: pd.__version__
Out[223]: '0.13.1'

我想还有其他方法可以实现这个功能，但是我无法弄清楚这种行为。我的意思是，1来自哪里？我认为逻辑是合理的。我是否误解了该功能的工作原理？

Answer 1

这基本上是where内部的作用。我认为这可能是一个转换错误。修正了错误here。结果是一个对称的DataFrame和一个需要重现的传递帧。非常微妙。请注意，这种其他形式的索引（下面）使用了一个不同的方法，因为它没问题。

In [56]: a[~do_not_replace] = b

In [57]: a
Out[57]: 
   0  1
0  4  3
1  6  0

注意：这已在master / 0.14.1中修复。

Answer 2

我无法使用“普通”numpy数组重现此问题：

import numpy as np
a=array([(4,1),(6,0)])
b=array([(np.NaN,3),(3,np.NaN)])

print a
print b

do_not_replace = np.isnan(b) | (a>b)
print do_not_replace

print np.where(do_not_replace, a, b)

...给出你想要的东西，我想：

array([[ 4.,  3.],
       [ 6.,  0.]])

@jwilner：正如@Jeff所说，它可能是一个pandas错误。你在运行什么版本？

pandas dataframe.where行为不端

2 个答案: