Question

我有一个数据框，其中包含一列我要转换为int的浮点数：

> df['VEHICLE_ID'].head()
0    8659366.0
1    8659368.0
2    8652175.0
3    8652174.0
4    8651488.0

理论上我应该能够使用：

> df['VEHICLE_ID'] = df['VEHICLE_ID'].astype(int)

但我明白了：

Output: ValueError: Cannot convert NA to integer

但我很确定这个系列中没有NaN：

> df['VEHICLE_ID'].fillna(999,inplace=True)
> df[df['VEHICLE_ID'] == 999]
> Output: Empty DataFrame
Columns: [VEHICLE_ID]
Index: []

发生了什么？

Answer 1

基本上错误告诉你你finishSync的值，我会说明为什么你的尝试没有揭示这个：

NaN

现在尝试施放：

In [7]:
# setup some data
df = pd.DataFrame({'a':[1.0, np.NaN, 3.0, 4.0]})
df
Out[7]:
     a
0  1.0
1  NaN
2  3.0
3  4.0

这引起了：

df['a'].astype(int)

然后你尝试过这样的事情：

ValueError: Cannot convert NA to integer

这没有打印，但是In [5]: for index, row in df['a'].iteritems(): if row == np.NaN: print('index:', index, 'isnull')不能使用相等的方式进行评估，实际上它有一个特殊属性，在与自身进行比较时会返回NaN：

False

现在打印行，您应该使用In [6]: for index, row in df['a'].iteritems(): if row != row: print('index:', index, 'isnull') index: 1 isnull来提高可读性：

isnull

那该怎么办？我们可以删除行：In [9]: for index, row in df['a'].iteritems(): if pd.isnull(row): print('index:', index, 'isnull') index: 1 isnull，或者我们可以使用df.dropna(subset='a')替换：

fillna

Answer 2

当您的系列包含浮点数和 nan 并且您想转换为整数时，当您尝试将浮点数转换为 numpy 整数时，您将收到错误消息，因为有 na 值。

不要做：

df['VEHICLE_ID'] = df['VEHICLE_ID'].astype(int)

从 pandas >= 0.24 开始，现在有一个内置的 pandas 整数。这确实允许整数nan。请注意 'Int64' 中的大写。 这是 Pandas 整数，而不是 numpy 整数。

所以，这样做：

df['VEHICLE_ID'] = df['VEHICLE_ID'].astype('Int64')

有关pandas integer na 值的更多信息：
https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#nan-integer-na-values-and-na-type-promotions

无法将nan转换为int（但没有nans）

2 个答案: