输入维度不匹配二进制crossentropy烤宽面条和Theano

时间:2016-02-12 06:58:38

标签: theano dimension mismatch lasagne

我阅读了网上的所有帖子,解决了人们忘记将目标向量更改为矩阵的问题,并且在此更改后仍然存在问题,我决定在此处提出我的问题。下面提到了解决方法,但是出现了新的问题,我很感谢您的建议!

使用卷积网络设置和带有sigmoid激活功能的二进制交叉熵,我得到了一个维度不匹配问题,但是在训练数据期间,只有在验证/测试数据评估期间才会出现。由于一些奇怪的原因,我的验证设置向量得到他的维度切换,我不知道,为什么。如上所述,培训工作正常。代码如下,非常感谢帮助(对于劫持线程感到抱歉,但我没有看到创建新线程的原因),其中大部分是从lasagne教程示例中复制的。

解决方法和新问题:

  1. 在valAcc定义中删除“axis = 1”会有所帮助,但验证精度保持为零,无论我有多少个节点,图层,过滤器等,测试分类总是会返回相同的结果。即使改变训练集大小(我每个类有大约350个样本,48x64灰度图像)也不会改变这一点。所以似乎有些事情
  2. 网络创建:

    def build_cnn(imgSet, input_var=None):
    # As a third model, we'll create a CNN of two convolution + pooling stages
    # and a fully-connected hidden layer in front of the output layer.
    
    # Input layer using shape information from training
    network = lasagne.layers.InputLayer(shape=(None, \
        imgSet.shape[1], imgSet.shape[2], imgSet.shape[3]), input_var=input_var)
    # This time we do not apply input dropout, as it tends to work less well
    # for convolutional layers.
    
    # Convolutional layer with 32 kernels of size 5x5. Strided and padded
    # convolutions are supported as well; see the docstring.
    network = lasagne.layers.Conv2DLayer(
            network, num_filters=32, filter_size=(5, 5),
            nonlinearity=lasagne.nonlinearities.rectify,
            W=lasagne.init.GlorotUniform())
    
    # Max-pooling layer of factor 2 in both dimensions:
    network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
    
    # Another convolution with 16 5x5 kernels, and another 2x2 pooling:
    network = lasagne.layers.Conv2DLayer(
            network, num_filters=16, filter_size=(5, 5),
            nonlinearity=lasagne.nonlinearities.rectify)
    
    network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
    
    # A fully-connected layer of 64 units with 25% dropout on its inputs:
    network = lasagne.layers.DenseLayer(
            lasagne.layers.dropout(network, p=.25),
            num_units=64,
            nonlinearity=lasagne.nonlinearities.rectify)
    
    # And, finally, the 2-unit output layer with 50% dropout on its inputs:
    network = lasagne.layers.DenseLayer(
            lasagne.layers.dropout(network, p=.5),
            num_units=1,
            nonlinearity=lasagne.nonlinearities.sigmoid)
    
    return network
    

    所有集合的目标矩阵都是这样创建的(训练目标向量为例)

     targetsTrain = np.vstack( (targetsTrain, [[targetClass], ]*numTr) );
    

    ...和theano变量本身

    inputVar = T.tensor4('inputs')
    targetVar = T.imatrix('targets')
    network = build_cnn(trainset, inputVar)
    predictions = lasagne.layers.get_output(network)
    loss = lasagne.objectives.binary_crossentropy(predictions, targetVar)
    loss = loss.mean()
    params = lasagne.layers.get_all_params(network, trainable=True)
    updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=0.01, momentum=0.9)
    valPrediction = lasagne.layers.get_output(network, deterministic=True)
    valLoss = lasagne.objectives.binary_crossentropy(valPrediction, targetVar)
    valLoss = valLoss.mean()
    valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar), dtype=theano.config.floatX)
    train_fn = function([inputVar, targetVar], loss, updates=updates,  allow_input_downcast=True)
    val_fn = function([inputVar, targetVar], [valLoss, valAcc])
    

    最后,这里有两个循环,训练和测试。第一个很好,第二个抛出错误,摘录如下

    # -- Neural network training itself -- #
    numIts = 100
    for itNr in range(0, numIts):
    train_err = 0
    train_batches = 0
    for batch in iterate_minibatches(trainset.astype('float32'), targetsTrain.astype('int8'), len(trainset)//4, shuffle=True):
        inputs, targets = batch
        print (inputs.shape)
        print(targets.shape)        
        train_err += train_fn(inputs, targets)
        train_batches += 1
    
    # And a full pass over the validation data:
    val_err = 0
    val_acc = 0
    val_batches = 0
    
    for batch in iterate_minibatches(valset.astype('float32'), targetsVal.astype('int8'), len(valset)//3, shuffle=False):
        [inputs, targets] = batch
        [err, acc] = val_fn(inputs, targets)
        val_err += err
        val_acc += acc
        val_batches += 1
    

    Erorr(摘录)

    Exception "unhandled ValueError"
    Input dimension mis-match. (input[0].shape[1] = 52, input[1].shape[1] = 1)
    Apply node that caused the error: Elemwise{eq,no_inplace}(DimShuffle{x,0}.0, targets)
    Toposort index: 36
    Inputs types: [TensorType(int64, row), TensorType(int32, matrix)]
    Inputs shapes: [(1, 52), (52, 1)]
    Inputs strides: [(416, 8), (4, 4)]
    Inputs values: ['not shown', 'not shown']
    

    再次,谢谢你的帮助!

1 个答案:

答案 0 :(得分:3)

所以似乎错误在于评估验证的准确性。 当您在计算中移除“axis = 1”时,argmax将继续执行所有操作,仅返回一个数字。 然后,广播步骤,这就是为什么你会看到整个集合的相同值。

但是从您发布的错误中,“T.eq”操作会抛出错误,因为它必须比较52 x 1和1 x 52向量(theano / numpy的矩阵)。 所以,我建议你尝试用以下代码替换:

    valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar.T))

我希望这可以解决错误,但我自己没有测试过。

编辑: 错误在于调用的argmax op。 通常,argmax用于确定哪个输出单元被激活最多。 但是,在您的设置中,您只有一个输出神经元,这意味着所有输出神经元上的argmax将始终返回0(对于第一个arg)。

这就是为什么你的网络给你的印象总是0作为输出。

替换:

    valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar.T))

使用:

    binaryPrediction = valPrediction > .5
    valAcc = T.mean(T.eq(binaryPrediction, targetVar.T)

你应该得到理想的结果。

我只是不确定,如果转置仍然是必要的。