我尝试在数据集中编码company_names
,我尝试使用
pd.get_dummies(Data['Company_share_code'])
以及
# X=data.iloc[:,0].values
from sklearn.preprocessing import LabelEncoder,OneHotEncoder
labelencoder=LabelEncoder()
Data['Company_share_code']=labelencoder.fit_transform(Data['Company_share_code'])
#One hot encoding
Onehotencoder=OneHotEncoder(categorical_features=[0])
Onehotencoder.fit_transform(Data['Company_share_code'])
但是我收到此错误-
/opt/conda/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py in _handle_deprecations(self, X)
392 "use the ColumnTransformer instead.", DeprecationWarning)
393 # Set categories_ to empty list if no categorical columns exist
--> 394 n_features = X.shape[1]
395 sel = np.zeros(n_features, dtype=bool)
396 sel[np.asarray(self.categorical_features)] = True
IndexError: tuple index out of range
答案 0 :(得分:0)
您必须
Onehotencoder=OneHotEncoder(categorical_features=[0])
Onehotencoder.fit_transform(Data['Company_share_code'].values.reshape(-1, 1))
这将给您一个稀疏矩阵。您可以使用todense()
有关玩具示例,请参见下文
from sklearn.preprocessing import LabelEncoder,OneHotEncoder
Data = pd.DataFrame({'Company_share_code' : ['A', 'B', 'C', 'B', 'B', 'A']})
labelencoder=LabelEncoder()
Data['Company_share_code']=labelencoder.fit_transform(Data['Company_share_code'])
#One hot encoding
Onehotencoder=OneHotEncoder(categorical_features=[0])
h = Onehotencoder.fit_transform(Data['Company_share_code'].values.reshape(-1, 1))
h.todense()
# Output
matrix([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 1., 0.],
[0., 1., 0.],
[1., 0., 0.]])