Question

我试图在pandas DataFrame中进行一些比较。

# create simple DataFrame
df = DataFrame(['one', 'two', 'three'], range(1,4), columns=['col1'])
# assign one col1 value to be NAN
df.loc[1, col1] = np.nan
# this comparison works
print df['col1'] == 'three'
# assign all col1 values to NAN
df.loc[:, 'col1'] = np.nan
# this comparison fails
print df['col1'] == 'three'

第一次比较（列中只有一个NAN值）按预期工作，但第二次（列中包含所有NAN值）会产生此错误：TypeError: invalid type comparison

这里发生了什么？

我看到了question，这表明这个问题有一些可行但有点黑客解决方案。

但为什么这种行为首先发生？这种限制是否有用，不知何故？我可以在比较之前使用df.fillna('')来修复它，但这看起来很笨拙和恼人。

所以我的问题是：
1.解决这个问题的最简洁方法是什么？ 2.无论如何，为什么这是默认行为？

Answer 1

在分配了所有col1之后，您的float类型为np.nan，因此尝试与string进行比较会引发TypeError。：

df = pd.DataFrame(['one', 'two', 'three'], range(1, 4), columns=['col1'])
df.loc[1, 'col1'] = np.nan

    col1
1    NaN
2    two
3  three

将单个np.nan分配给包含string值的列，并留下dtype个对象：

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 1 to 3
Data columns (total 1 columns):
col1    2 non-null object
dtypes: object(1)

但所有np.nan值都会转换为float：

df.loc[:, 'col1'] = np.nan
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 1 to 3
Data columns (total 1 columns):
col1    0 non-null float64
dtypes: float64(1)

Answer 2

问题可以通过使用ix索引器而不是iloc来解决，在这种情况下，系列的数据类型没有改变（不知道为什么会这样，可能两种类型的索引器应该具有一致的行为，我的偏好是将iloc更改为匹配ix）：

>>> df = pd.DataFrame(['one', 'two', 'three'], range(1,4), columns=['col1'])
>>> df['col1'].ix[:] = np.nan
>>> df.dtypes

col1    object
dtype: object

Answer 3

如果你做了：

# assign all col1 values to None
df.loc[:, 'col1'] = None

然后

# this comparison does not fail
print df['col1'] == 'three'

1    False
2    False
3    False
Name: col1, dtype: bool

NaN

3 个答案: