Question

我正在尝试将汽车名称从NumPy数组转换为用于线性回归的数值。标签编码器发出警告：ValueError：无法将字符串转换为float：'porsche' 有人可以帮忙吗？

此处提供代码：

 from sklearn.preprocessing import StandardScaler
 from sklearn.preprocessing import LabelEncoder, OneHotEncoder
 enc = LabelEncoder()
 enc.fit_transform(Z[:,0:1])
 onehotencoder = OneHotEncoder(categorical_features = [0])
 Z = onehotencoder.fit_transform(Z).toarray()`

和输出：ValueError：无法将字符串转换为float：'porsche'

这是数组： Array name = Z, type str416,

Answer 1

对于一种热门编码，我建议您改用pd.get_dummies，更容易使用：

# make sure Z is a dataframe
X = pd.get_dummies(Z).values

如果要使用sklearn的OHE，可以参考以下示例：

from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder

df = pd.DataFrame({'a':['audi','porsche','audi'], 'b':[1,2,3]})
ohe = OneHotEncoder()

mat = ohe.fit_transform(df[['a']])

# view the contents of array
mat.todense()

matrix([[1., 0.],
        [0., 1.],
        [1., 0.]])

# get feature names
ohe.get_feature_names()
array(['x0_audi', 'x0_porsche'], dtype=object)

ValueError：无法将字符串转换为浮点数：sklearn，numpy，panda

1 个答案: