我正在尝试将汽车名称从NumPy数组转换为用于线性回归的数值。 标签编码器发出警告:ValueError:无法将字符串转换为float:'porsche' 有人可以帮忙吗?
此处提供代码:
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
enc = LabelEncoder()
enc.fit_transform(Z[:,0:1])
onehotencoder = OneHotEncoder(categorical_features = [0])
Z = onehotencoder.fit_transform(Z).toarray()`
和输出:ValueError:无法将字符串转换为float:'porsche'
答案 0 :(得分:1)
对于一种热门编码,我建议您改用pd.get_dummies
,更容易使用:
# make sure Z is a dataframe
X = pd.get_dummies(Z).values
如果要使用sklearn的OHE,可以参考以下示例:
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder
df = pd.DataFrame({'a':['audi','porsche','audi'], 'b':[1,2,3]})
ohe = OneHotEncoder()
mat = ohe.fit_transform(df[['a']])
# view the contents of array
mat.todense()
matrix([[1., 0.],
[0., 1.],
[1., 0.]])
# get feature names
ohe.get_feature_names()
array(['x0_audi', 'x0_porsche'], dtype=object)