Question

我正在尝试重新创建类似于 sklearn.preprocessing.LabelEncoder

但是我不想使用sklearn或pandas。我只想使用numpy和Python标准库。这是我想要实现的目标：

import numpy as np
input = np.array([['hi', 'there'],
                     ['scott', 'james'],
                     ['hi', 'scott'],
                     ['please', 'there']])

# Output would look like
np.ndarray([[0, 0],
            [1, 1],
            [0, 2],
            [2, 0]])

也可以将其映射回原来的位置，这也很好，因此结果将再次看起来完全像输入一样。

如果这是在电子表格中，则输入如下所示：

Answer 1

使用return_inverse的{{1}}结果，这是一个简单的理解

np.unique

或沿轴应用：

arr = np.array([['hi', 'there'], ['scott', 'james'],
                ['hi', 'scott'], ['please', 'there']])

np.column_stack([np.unique(arr[:, i], return_inverse=True)[1] for i in range(arr.shape[1])])

array([[0, 2],
       [2, 0],
       [0, 1],
       [1, 2]], dtype=int64)

Answer 2

正在与@Scott Stoltzmann交谈，并吐口水想办法扭转公认的答案。

一个人可以随身携带原始arr到他们的程序中，也可以记录每一列的映射。如果您选择后者，则可以使用以下一些简单的无效代码：

l = []

for real_column, encoded_column in zip(np.column_stack(arr), np.column_stack(arr2)):    
    d = {}
    for real_element, encoded_element in zip(real_column, encoded_column):
        d[encoded_element] = real_element
    l.append(d)
print(l)

通过以上操作完成此操作

[{0：'hi'，2：'scott'，1：'please'}，{2：'there'，0：'james'，1：'scott'}]

如何仅使用numpy（而不使用sklearn LabelEncoder）创建标签编码器？

2 个答案: