使用OneHotEncoding时出现值错误

时间:2019-12-18 11:53:48

标签: python-3.x pandas scikit-learn one-hot-encoding

我正在尝试从标签编码列中删除权重 我的DataFrame中有以下几列

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000209 entries, 0 to 1000208
Data columns (total 28 columns):
MovieID        1000209 non-null int64
Title          1000209 non-null object
Genres         1000209 non-null object
Action         1000209 non-null int64
Adventure      1000209 non-null int64
Animation      1000209 non-null int64
Children's     1000209 non-null int64
Comedy         1000209 non-null int64
Crime          1000209 non-null int64
Documentary    1000209 non-null int64
Drama          1000209 non-null int64
Fantasy        1000209 non-null int64
Film-Noir      1000209 non-null int64
Horror         1000209 non-null int64
Musical        1000209 non-null int64
Mystery        1000209 non-null int64
Romance        1000209 non-null int64
Sci-Fi         1000209 non-null int64
Thriller       1000209 non-null int64
War            1000209 non-null int64
Western        1000209 non-null int64
UserID         1000209 non-null int64
Rating         1000209 non-null int64
Timestamp      1000209 non-null int64
Gender         1000209 non-null object
Age            1000209 non-null int64
Occupation     1000209 non-null int64
Zip-code       1000209 non-null object

我尝试了以下代码从“职业”栏中删除权重

from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder(categorical_features=[-2])
final_movies_df = ohe.fit_transform(final_movies_df)

此后,我得到了一个值错误,指出

ValueError: could not convert string to float: 'Toy Story (1995)'

1 个答案:

答案 0 :(得分:0)

选项1.对于旧版本的sklearn,您需要执行两个步骤:

  • 使用LabelEncoder将字符串变量编码为整数
  • 然后在整数变量上使用OneHotEncoder

选项2。对于较新版本的sklearn:

# don't use categorical_features=[-2] in encoder init
from sklearn.preprocessing import OneHotEncoder
import numpy as np

onehotencoder = OneHotEncoder()

X = np.array(final_movies_df)
y = onehotencoder.fit_transform(X[:,[-2]]).toarray()

选项3。或者您可以在此处使用get_dummies方法:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html