Question

我正在尝试使用 tensorflow 构建 mnist 数据集的y_train的一种热编码。我不知道该怎么办？

# unique values 0 - 9
y_train = array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

在keras中，我们将做类似的事情

# this converts it into one hot encoding
one hot_encoding = tf.keras.utils.to_categorical(y_train)

我在tf.one_hot和indices参数中应该输入什么？完成一次热编码后，如何将其从 2d张量转换回 numpy 数组？

Answer 1

我对Tensorflow并不熟悉，但是经过一些测试，这就是我发现的东西：

tf.one_hot()包含一个indices和一个depth。 indices是实际转换为单编码的值。 depth是指要利用的最大值。

例如，采用以下代码：

y = [1, 2, 3, 2, 1]
tf.keras.utils.to_categorical(y)
sess = tf.Session();
with sess.as_default():
    print(tf.one_hot(y, 2).eval())
    print(tf.one_hot(y, 4).eval())
    print(tf.one_hot(y, 6).eval())

tf.keras.utils.to_categorical(y)返回以下内容：

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 1., 0.],
       [0., 1., 0., 0.]], dtype=float32)

相反，tf.one_hot()选项（2、4和6）执行以下操作：

[[0. 1.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 1.]]
[[0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]
 [0. 0. 1. 0.]
 [0. 1. 0. 0.]]
[[0. 1. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 1. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0.]]

如此处所示，要使用tf.keras.utils.to_categorical()模仿tf.one_hot()，depth参数应等于数组中存在的最大值，即0表示+1。，最大值为3，因此在编码中有四个可能的值-0、1、2和3。因此，在单次热编码中，代表所有这些值的深度需要为4。 >

对于转换为numpy，如上所示，使用Tensorflow会话，在张量上运行eval()会将其转换为numpy数组。有关执行此操作的方法，请参阅How can I convert a tensor into a numpy array in TensorFlow?。

我对Tensorflow并不熟悉，但我希望这会有所帮助。

注意：就MNIST而言，深度为10即可。

Answer 2

我想反驳@Andrew Fan所说的话。首先，上面的y标签列表不是从索引0开始的，这是必需的。只需查看所有这些示例中的第一列（即索引0）：它们都是空的。这将在学习中造成多余的课堂，并可能导致问题。一个热点会创建一个简单列表，其中该索引位置为1，其他位置为零。因此，您的深度必须与类数相同，但还必须从索引0开始。

如何使用tf.one_hot计算一种热编码？

2 个答案: