Question

我正在尝试对类别值使用oneHotEncoder

但是它失败并显示以下错误。可能是什么错？请帮忙，任何评论都欢迎。

下面是代码片段

AggregatingStateDescriptor

================================================ ==================== 代码的输出如下看起来问题出在数组格式

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
print(X.shape)
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
X[:, 1] = labelencoder_X.fit_transform(X[:, 1])
print(X)
print(X.shape)
print(y)
#X = X.reshape(len(X[:, 0]), 7)
print(X.shape)
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()
print(X.shape)
print(X)

Answer 1

您应将OneHotEncoder应用于所需的列：

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

onehotencoder = OneHotEncoder()
X_0 = onehotencoder.fit_transform(X[:, 0]).toarray()
X_1 = onehotencoder.fit_transform(X[:, 1]).toarray()

这将根据X[:, 0]或X[:, 1]中不同值的数目返回2个矩阵，它们具有与X相同的行数和一列的列数

自由合并矩阵或其他任何东西之后。如果您想知道该列或特定类别，可以访问onehotencoder.feature_indices_ ，但是当您使用相同的OHE时，将丢失功能X0的信息。

我希望这会有所帮助，

Answer 2

即使您指定categorical_features = [0]，OneHotEncoder仍将检查（所有列中的）所有数据以与scikit-learn兼容，因此当其他列包含字符串数据时会引发错误。

因此，您可以在此处执行的操作是仅发送要虚拟编码的数据：-

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
print(X.shape)
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
X[:, 1] = labelencoder_X.fit_transform(X[:, 1])
print(X)
print(X.shape)
print(y)
#X = X.reshape(len(X[:, 0]), 7)
print(X.shape)

onehotencoder = OneHotEncoder()

categorical_features = [0]
# Send only the first column to onehotencoder
X_oneHotEncoded = onehotencoder.fit_transform(X[:, categorical_features]).toarray()

# Combine the two arrays back together
X_final = np.hstack((X_oneHotEncoded, X[:,1:]))

一种热编码：ValueError：无法将字符串转换为浮点型：'是'

我正在尝试对类别值使用oneHotEncoder

2 个答案: