我正在尝试从标签编码列中删除权重 我的DataFrame中有以下几列
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000209 entries, 0 to 1000208
Data columns (total 28 columns):
MovieID 1000209 non-null int64
Title 1000209 non-null object
Genres 1000209 non-null object
Action 1000209 non-null int64
Adventure 1000209 non-null int64
Animation 1000209 non-null int64
Children's 1000209 non-null int64
Comedy 1000209 non-null int64
Crime 1000209 non-null int64
Documentary 1000209 non-null int64
Drama 1000209 non-null int64
Fantasy 1000209 non-null int64
Film-Noir 1000209 non-null int64
Horror 1000209 non-null int64
Musical 1000209 non-null int64
Mystery 1000209 non-null int64
Romance 1000209 non-null int64
Sci-Fi 1000209 non-null int64
Thriller 1000209 non-null int64
War 1000209 non-null int64
Western 1000209 non-null int64
UserID 1000209 non-null int64
Rating 1000209 non-null int64
Timestamp 1000209 non-null int64
Gender 1000209 non-null object
Age 1000209 non-null int64
Occupation 1000209 non-null int64
Zip-code 1000209 non-null object
我尝试了以下代码从“职业”栏中删除权重
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder(categorical_features=[-2])
final_movies_df = ohe.fit_transform(final_movies_df)
此后,我得到了一个值错误,指出
ValueError: could not convert string to float: 'Toy Story (1995)'
答案 0 :(得分:0)
# don't use categorical_features=[-2] in encoder init
from sklearn.preprocessing import OneHotEncoder
import numpy as np
onehotencoder = OneHotEncoder()
X = np.array(final_movies_df)
y = onehotencoder.fit_transform(X[:,[-2]]).toarray()
get_dummies
方法:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html