Question

我有以下DataFrame

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 7 columns):
Borough        20 non-null object
Indian         20 non-null object
Pakistani      20 non-null object
Bangladeshi    20 non-null object
Chinese        20 non-null object
Other_Asian    20 non-null object
Total_Asian    20 non-null object
dtypes: object(7)

只有“ Borough”列是字符串，其他列应为int或float。我正在尝试使用astype（int）进行转换。我已经尝试了互联网上提到的所有选项，但仍然出现错误。

df_LondonEthnicity['Indian'] = df_LondonEthnicity['Indian'].astype(int)

错误是：

以10为底的int（）无效文字：

我也尝试过

df_LondonEthnicity['Indian'] = df_LondonEthnicity.astype({'Indian': int}).dtypes

我也尝试过

cols = ['Indian', 'Pakistani', 'Bangladeshi', 'Chinese', 'Other_Asian', 'Total_Asian']  

for col in cols:  # Iterate over chosen columns
  df_LondonEthnicity[col] = pd.to_numeric(df_LondonEthnicity[col])

还尝试将字符串转换为浮点数

我希望对此有所帮助。谢谢

Answer 1

如注释中所指出，您需要使用to_numeric函数。

错误的意思是您尝试转换的值包含0-9（base10）以外的字符。

因此，您拥有的选项是使用pd.to_numeric并将所有不合格值设为NaN或以某种方式进行转换。

假设您有一个这样的数据框。

使用pd.to_numeric将产生这样的输出。但是值是浮点数。

>>> pd.to_numeric(df.X, errors='coerce')
0    123.0
1      NaN
2    200.0
3    200.1
Name: X, dtype: float64

其他选择是按某种方式进行转换。

>>> df.X.str.extract(r'([\d]+)').astype(int)
     0
0  123
1  123
2  200
3  200

DataFrame对象类型列为int或float错误

1 个答案: