如何处理具有NaN的Pandas Series数据类型?

时间:2016-09-03 06:58:16

标签: python pandas matplotlib dataframe

在pandas.core.series.Series类型中使用带有NaN的max()和min()时会发生什么?这是一个错误吗?见下文,

%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

mydata = pd.DataFrame(np.random.standard_normal((100,1)), columns=['No NaN'])
mydata['Has NaN'] = mydata['No NaN'] / mydata['No NaN'].shift(1)

# Both return NaN!
print(min(mydata['Has NaN']), max(mydata['Has NaN']))
# Still why False? Isn't float('nan') a singleton like None?
print(min(mydata['Has NaN']) == max(mydata['Has NaN']))
# But this time works well!
print(min([1, 2, 3, float('nan')]))

print('\n')

# When Series data type that has NaN bumps into min() and max(), what should 
#  I do? E.g.,
try: 
    n, bins, patches = plt.hist(mydata['Has NaN'], 10)
except ValueError as e:
    print(e, '\nSeems "range" argument in hist() has problem!')

2 个答案:

答案 0 :(得分:3)

你应该使用Pandas或NumPy函数而不是vanilla Python函数:

In [7]: mydata['Has NaN'].min(), mydata['Has NaN'].max()
Out[7]: (-46.00309057827485, 62.430829637766671)

In [8]: min(mydata['Has NaN']), max(mydata['Has NaN'])
Out[8]: (nan, nan)

In [125]: mydata.plot.hist(alpha=0.5)
Out[125]: <matplotlib.axes._subplots.AxesSubplot at 0x1a784588>

enter image description here

答案 1 :(得分:3)

首先,在处理maxmin时,您不应该使用内置的pandasnumpy,尤其是在您使用{时{1}}。

因为&#39; nan&#39;是nan的第一项,它永远不会替换为mydata['Has NaN']max,因为(如docs中所述):

  

非数字值float(&#39; NaN&#39;)和Decimal(&#39; NaN&#39;)是特殊的。   它们与自身相同(x是x是真的)但不等于   他们自己(x == x是假的)。另外,将任何数字与a进行比较   not-a-number值将返回False。例如,3&lt;   浮动(&#39; NaN&#39;)和浮动(&#39; NaN&#39;)&lt; 3将返回False。

相反,请使用min pandasmax方法:

min

关于直方图,这似乎是In [4]: mydata['Has NaN'].min() Out[4]: -176.9844930355774 In [5]: mydata['Has NaN'].max() Out[5]: 12.684033138603787 的已知问题,请参阅herehere

现在处理它应该相当简单:

plt.hist

enter image description here