ValueError:无法将字符串转换为浮点数:sklearn,numpy,panda

时间:2020-01-19 10:05:45

标签: python arrays pandas scikit-learn numpy-ndarray

我正在尝试将汽车名称从NumPy数组转换为用于线性回归的数值。 标签编码器发出警告:ValueError:无法将字符串转换为float:'porsche' 有人可以帮忙吗?

此处提供代码:

 from sklearn.preprocessing import StandardScaler
 from sklearn.preprocessing import LabelEncoder, OneHotEncoder
 enc = LabelEncoder()
 enc.fit_transform(Z[:,0:1])
 onehotencoder = OneHotEncoder(categorical_features = [0])
 Z = onehotencoder.fit_transform(Z).toarray()`

和输出:ValueError:无法将字符串转换为float:'porsche'

这是数组: Array name = Z, type str416,

1 个答案:

答案 0 :(得分:1)

对于一种热门编码,我建议您改用pd.get_dummies,更容易使用:

# make sure Z is a dataframe
X = pd.get_dummies(Z).values

如果要使用sklearn的OHE,可以参考以下示例:

from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder

df = pd.DataFrame({'a':['audi','porsche','audi'], 'b':[1,2,3]})
ohe = OneHotEncoder()

mat = ohe.fit_transform(df[['a']])

# view the contents of array
mat.todense()

matrix([[1., 0.],
        [0., 1.],
        [1., 0.]])

# get feature names
ohe.get_feature_names()
array(['x0_audi', 'x0_porsche'], dtype=object)