编码分类数据

时间:2019-03-12 11:41:40

标签: python machine-learning encoding scikit-learn

我尝试在数据集中编码company_names,我尝试使用 pd.get_dummies(Data['Company_share_code'])以及

# X=data.iloc[:,0].values
from sklearn.preprocessing import LabelEncoder,OneHotEncoder

labelencoder=LabelEncoder()
Data['Company_share_code']=labelencoder.fit_transform(Data['Company_share_code'])

#One hot encoding

Onehotencoder=OneHotEncoder(categorical_features=[0])
Onehotencoder.fit_transform(Data['Company_share_code'])

但是我收到此错误-

/opt/conda/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py in _handle_deprecations(self, X)
    392                     "use the ColumnTransformer instead.", DeprecationWarning)
    393                 # Set categories_ to empty list if no categorical columns exist
--> 394                 n_features = X.shape[1]
    395                 sel = np.zeros(n_features, dtype=bool)
    396                 sel[np.asarray(self.categorical_features)] = True

IndexError: tuple index out of range

1 个答案:

答案 0 :(得分:0)

您必须

Onehotencoder=OneHotEncoder(categorical_features=[0])
Onehotencoder.fit_transform(Data['Company_share_code'].values.reshape(-1, 1))

这将给您一个稀疏矩阵。您可以使用todense()

将其转换为密集

有关玩具示例,请参见下文

from sklearn.preprocessing import LabelEncoder,OneHotEncoder


Data = pd.DataFrame({'Company_share_code' : ['A', 'B', 'C', 'B', 'B', 'A']})

labelencoder=LabelEncoder()
Data['Company_share_code']=labelencoder.fit_transform(Data['Company_share_code'])

#One hot encoding

Onehotencoder=OneHotEncoder(categorical_features=[0])
h = Onehotencoder.fit_transform(Data['Company_share_code'].values.reshape(-1, 1))

h.todense()

# Output
matrix([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.],
        [0., 1., 0.],
        [0., 1., 0.],
        [1., 0., 0.]])