在pandas.core.series.Series类型中使用带有NaN的max()和min()时会发生什么?这是一个错误吗?见下文,
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
mydata = pd.DataFrame(np.random.standard_normal((100,1)), columns=['No NaN'])
mydata['Has NaN'] = mydata['No NaN'] / mydata['No NaN'].shift(1)
# Both return NaN!
print(min(mydata['Has NaN']), max(mydata['Has NaN']))
# Still why False? Isn't float('nan') a singleton like None?
print(min(mydata['Has NaN']) == max(mydata['Has NaN']))
# But this time works well!
print(min([1, 2, 3, float('nan')]))
print('\n')
# When Series data type that has NaN bumps into min() and max(), what should
# I do? E.g.,
try:
n, bins, patches = plt.hist(mydata['Has NaN'], 10)
except ValueError as e:
print(e, '\nSeems "range" argument in hist() has problem!')
答案 0 :(得分:3)
你应该使用Pandas或NumPy函数而不是vanilla Python函数:
In [7]: mydata['Has NaN'].min(), mydata['Has NaN'].max()
Out[7]: (-46.00309057827485, 62.430829637766671)
In [8]: min(mydata['Has NaN']), max(mydata['Has NaN'])
Out[8]: (nan, nan)
In [125]: mydata.plot.hist(alpha=0.5)
Out[125]: <matplotlib.axes._subplots.AxesSubplot at 0x1a784588>
答案 1 :(得分:3)
首先,在处理max
或min
时,您不应该使用内置的pandas
或numpy
,尤其是在您使用{时{1}}。
因为&#39; nan&#39;是nan
的第一项,它永远不会替换为mydata['Has NaN']
或max
,因为(如docs中所述):
非数字值float(&#39; NaN&#39;)和Decimal(&#39; NaN&#39;)是特殊的。 它们与自身相同(x是x是真的)但不等于 他们自己(x == x是假的)。另外,将任何数字与a进行比较 not-a-number值将返回False。例如,3&lt; 浮动(&#39; NaN&#39;)和浮动(&#39; NaN&#39;)&lt; 3将返回False。
相反,请使用min
pandas
和max
方法:
min
关于直方图,这似乎是In [4]: mydata['Has NaN'].min()
Out[4]: -176.9844930355774
In [5]: mydata['Has NaN'].max()
Out[5]: 12.684033138603787
的已知问题,请参阅here和here。
现在处理它应该相当简单:
plt.hist