Question

我需要将one-hot编码转换为由唯一整数表示的类别。因此，使用以下代码创建了一个热门编码：

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
labels = [[1],[2],[3]]
enc.fit(labels)  
for x in [1,2,3]:
    print(enc.transform([[x]]).toarray())

Out:
[[ 1.  0.  0.]]
[[ 0.  1.  0.]]
[[ 0.  0.  1.]]

可以转换回一组唯一的整数，例如：

[1,2,3]或[11,37,45]或任何其他每个整数唯一代表一个类的人。

是否可以使用scikit-learn或任何其他python库？

*更新*

试图：

labels = [[1],[2],[3], [4], [5],[6],[7]]
enc.fit(labels) 

lst = []
for x in [1,2,3,4,5,6,7]:
    lst.append(enc.transform([[x]]).toarray())
lst
Out:
[array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.]]),
 array([[ 0.,  1.,  0.,  0.,  0.,  0.,  0.]]),
 array([[ 0.,  0.,  1.,  0.,  0.,  0.,  0.]]),
 array([[ 0.,  0.,  0.,  1.,  0.,  0.,  0.]]),
 array([[ 0.,  0.,  0.,  0.,  1.,  0.,  0.]]),
 array([[ 0.,  0.,  0.,  0.,  0.,  1.,  0.]]),
 array([[ 0.,  0.,  0.,  0.,  0.,  0.,  1.]])]


a = np.array(lst)
np.where(a==1)[1]
Out:
array([0, 0, 0, 0, 0, 0, 0], dtype=int64)

不是我需要的

Answer 1

您可以使用np.where执行此操作，如下所示：

import numpy as np
a=np.array([[ 0.,  1.,  0.],
            [ 1.,  0.,  0.],
            [ 0.,  0.,  1.]])
np.where(a==1)[1]

这会打印array([1, 0, 2], dtype=int64)。这是有效的，因为np.where(a==1)[1]会返回1的列索引，这些列正是标签。

此外，由于a是0,1 - 矩阵，您还可以将np.where(a==1)[1]替换为np.where(a)[1]。

更新：以下解决方案适用于您的格式：

l=[np.array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.]]),
 np.array([[ 0.,  0.,  1.,  0.,  0.,  0.,  0.]]),
 np.array([[ 0.,  1.,  0.,  0.,  0.,  0.,  0.]]),
 np.array([[ 0.,  0.,  0.,  0.,  1.,  0.,  0.]]),
 np.array([[ 0.,  0.,  0.,  0.,  1.,  0.,  0.]]),
 np.array([[ 0.,  0.,  0.,  0.,  0.,  1.,  0.]]),
 np.array([[ 0.,  0.,  0.,  0.,  0.,  0.,  1.]])]
a=np.array(l)

np.where(a)[2]

打印

array([0, 2, 1, 4, 4, 5, 6], dtype=int64)

Alternativaly，您可以将原始解决方案与@ ml4294的评论一起使用。

Answer 2

您可以使用np.argmax()：

from sklearn.preprocessing import OneHotEncoder
import numpy as np

enc = OneHotEncoder()
labels = [[1],[2],[3]]
enc.fit(labels)  
x = enc.transform(labels).toarray()


# x = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
xr = (np.argmax(x, axis=1)+1).reshape(-1, 1)
print(xr)

这应该返回array([[1], [2], [3]])。如果您想改为array([[0], [1], [2]])，只需删除+1定义中的xr。

Answer 3

由于您使用sklearn.preprocessing.OneHotEncoder来“编码”数据，因此可以使用其.inverse_transform()方法来“解码”数据（我认为这需要.__version__ = 0.20.1或更高版本）：< / p>

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
labels = [[1],[2],[3]]
encoder = enc.fit(labels)
encoded_labels = encoder.transform(labels)
decoded_labels = encoder.inverse_transform(encoded_labels)
decoded_labels # array([[1],
                        [2],
                        [3]])

n.b。 encoded_labels是一个numpy数组，而不是列表。

来源：https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder.inverse_transform

Scikit：将单热编码转换为带整数的编码

3 个答案: