+不支持的操作数类型:使用Pandas表示'int'和'str'

时间:2019-03-14 18:41:31

标签: python pandas dataframe

当我尝试获取数据框的某列的平均值时,它显示错误:

TypeError: unsupported operand type(s) for +: 'int' and 'str'

这是我的代码:

import pandas as pd

import numpy as np

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data"

df = pd.read_csv(url, header = None, )

headers = ["symboling","normalized-losses","make","fuel-type","aspiration","num-of-doors","body-style","drive-wheels","engine-location","wheel-base","lenght","width","height","curb-weight","engine-type","num-of-cylinders","engine-size","fuel-system","bore","stroke","compression-ratio","horsepower","peak-rpm","city-mpg","highway-mpg","price"]

df.columns = headers

df.replace('?',np.nan, inplace=True)

mean_val = df['normalized-losses'].mean()

print(mean_val)

2 个答案:

答案 0 :(得分:4)

您需要使用pd.to_numeric()将列数据类型转换为数字。如果您使用options errors ='coerce'选项,它将自动用NaN替换非数字字符。

mean_val = pd.to_numeric(df['normalized-losses'], errors='coerce').mean()

print(mean_val)

> 122.0

答案 1 :(得分:1)

在纳撒尼尔的答案上,您混合使用ports: - 12320 float。你可以看到这个

str

哪个会回来

print(df['normalized-losses'].apply(type))

如错误消息所述,您需要将所有数据设为0 <class 'float'> 1 <class 'float'> 2 <class 'float'> 3 <class 'str'> 4 <class 'str'> 类型。您可以按照Nathaniel的建议使用float,也可以使用

pd.to_numeric

输出

  

122.0

如果您只对标准化损失列感兴趣,并且知道所有字符串都可以正确转换(在这种情况下,我相信它们可以,因为它们都是数字字符串,例如'130'),您可以只是这样做。如果要使用其余数据并希望转换所有数字字符串,请使用Nathaniel的实现。