Question

labels = np.array([['positive'],['negative'],['negative'],['positive']])
# output from pandas is similar to the above
values = (labels=='positive').astype(np.int_)
to_categorical(values,2)

输出：

array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

如果我删除包含每个元素的内部列表，它似乎工作得很好

labels = np.array([['positive'],['negative'],['negative'],['positive']])
values = (labels=='positive').astype(np.int_)
to_categorical(values.T[0],2)

输出：

array([[ 0.,  1.],
       [ 1.,  0.],
       [ 1.,  0.],
       [ 0.,  1.]])

为什么这样做？我正在学习一些教程，但即使对于数组数组，它们似乎也得到了正确的输出。最近升级的是这样吗？

我在tflearn (0.3.2)

上使用py362

Answer 1

查看to_categorical的源代码：

def to_categorical(y, nb_classes):
    """ to_categorical.

    Convert class vector (integers from 0 to nb_classes)
    to binary class matrix, for use with categorical_crossentropy.

    Arguments:
        y: `array`. Class vector to convert.
        nb_classes: `int`. Total number of classes.

    """
    y = np.asarray(y, dtype='int32')
    if not nb_classes:
        nb_classes = np.max(y)+1
    Y = np.zeros((len(y), nb_classes))
    Y[np.arange(len(y)),y] = 1.
    return Y

核心部分是高级索引 Y[np.arange(len(y)),y] = 1，它将输入向量y视为结果数组中的列索引;所以y需要一个1d数组才能正常工作，你通常会得到任意二维数组的广播错误：

例如：

to_categorical([[1,2,3],[2,3,4]], 2)

----------------------------------------------- ---------------------------- IndexError Traceback（最近一次调用   最后）in（）   ----＆GT; 1 to_categorical（[[1,2,3]，[2,3,4]]，2）

C：\ anaconda3 \ ENVS \ tensorflow \ lib中\站点包\ tflearn \ data_utils.py   在to_categorical（y，nb_classes）        40 nb_classes = np.max（y）+1        41 Y = np.zeros（（len（y），nb_classes））   ---＆GT; 42 Y [np.arange（len（y）），y] = 1。        43返回Y.        44

IndexError：形状不匹配：无法广播索引数组   与形状（2，）（2,3）一起

这两种方法都可以正常工作：

to_categorical(values.ravel(), 2)
array([[ 0.,  1.],
       [ 1.,  0.],
       [ 1.,  0.],
       [ 0.,  1.]])

to_categorical(values.squeeze(), 2)
array([[ 0.,  1.],
       [ 1.,  0.],
       [ 1.,  0.],
       [ 0.,  1.]])

to_categorical(values[:,0], 2)
array([[ 0.,  1.],
       [ 1.,  0.],
       [ 1.,  0.],
       [ 0.,  1.]])

tflearn to_categorical：处理来自pandas.df.values的数据：数组数组

1 个答案: