火炬MNIST:如何解决不好的论点#2到' v' (3D或4D输入张量预期但得到:[199 x 784]

时间:2016-12-11 08:32:10

标签: lua torch

我试图在这里修改基本教程模型(http://rnduja.github.io/2015/10/13/torch-mnist/ 并在火炬上制作更深层次的模型。这是我的代码(当我在上面的url中尝试相同的模型时代码工作):

加载数据

require 'torch'
require 'nn'
require 'optim'
mnist = require 'mnist'
fullset = mnist.traindataset()
testset = mnist.testdataset()
trainset = {
    size = 50000,
    data = fullset.data[{{1,50000}}]:double(),
    label = fullset.label[{{1,50000}}]
}
validationset = {
    size = 10000,
    data = fullset.data[{{50001,60000}}]:double(),
    label = fullset.label[{{50001,60000}}]
}

定义网络

model = nn.Sequential()
model:add(nn.Reshape(28*28))
model:add(nn.SpatialConvolution(1, 32, 3, 3))
model:add(nn.ReLU())
model:add(nn.SpatialConvolution(32, 32, 3, 3))
model:add(nn.ReLU())
model:add(nn.SpatialMaxPooling(2, 2, 2, 2))
-- model:add(nn.SpatialDropout(0.25))  -- Ignored for simplicity. 

-- model:add(nn.Reshape(12*12*32))
model:add(nn.View(12*12*32))  
model:add(nn.Linear(12*12*32, 128))
model:add(nn.ReLU())
-- model:add(nn.SpatialDropout(0.5))  -- Ignored for simplicity. 
model:add(nn.Linear(128, 10))
model:add(nn.SoftMax())
criterion = nn.ClassNLLCriterion()

定义下降算法

sgd_params = {
   learningRate = 1e-2,
   learningRateDecay = 1e-4,
   weightDecay = 1e-3,
   momentum = 1e-4
}
x, dl_dx = model:getParameters()
step = function(batch_size)
    local current_loss = 0
    local count = 0
    local shuffle = torch.randperm(trainset.size)
    batch_size = batch_size or 200

    for t = 1,trainset.size,batch_size do
        -- setup inputs and targets for this mini-batch
        local size = math.min(t + batch_size - 1, trainset.size) - t
        local inputs = torch.Tensor(size, 28, 28)
        local targets = torch.Tensor(size)
        for i = 1,size do
            local input = trainset.data[shuffle[i+t]]
            local target = trainset.label[shuffle[i+t]]
            -- if target == 0 then target = 10 end
            inputs[i] = input
            targets[i] = target
        end
        targets:add(1)

        local feval = function(x_new)
            -- reset data
            if x ~= x_new then x:copy(x_new) end
            dl_dx:zero()

            -- perform mini-batch gradient descent
            local loss = criterion:forward(model:forward(inputs), targets)
            model:backward(inputs, criterion:backward(model.output, targets))

            return loss, dl_dx
        end

        _, fs = optim.sgd(feval, x, sgd_params)
        -- fs is a table containing value of the loss function
        -- (just 1 value for the SGD optimization)
        count = count + 1
        current_loss = current_loss + fs[1]
    end

    -- normalize loss
    return current_loss / count
end
eval = function(dataset, batch_size)
    local count = 0
    batch_size = batch_size or 200

    for i = 1,dataset.size,batch_size do
        local size = math.min(i + batch_size - 1, dataset.size) - i
        local inputs = dataset.data[{{i,i+size-1}}]
        local targets = dataset.label[{{i,i+size-1}}]:long()
        local outputs = model:forward(inputs)
        local _, indices = torch.max(outputs, 2)
        indices:add(-1)
        local guessed_right = indices:eq(targets):sum()
        count = count + guessed_right
    end

    return count / dataset.size
end

列车

max_iters = 30
do
    local last_accuracy = 0
    local decreasing = 0
    local threshold = 1 -- how many deacreasing epochs we allow
    for i = 1,max_iters do
        local loss = step()
        print(string.format('Epoch: %d Current loss: %4f', i, loss))
        local accuracy = eval(validationset)
        print(string.format('Accuracy on the validation set: %4f', accuracy))
        if accuracy < last_accuracy then
            if decreasing > threshold then break end
            decreasing = decreasing + 1
        else
            decreasing = 0
        end
        last_accuracy = accuracy
    end
end

现在我在运行Train部分时收到以下错误消息。

...s/username/torch/install/share/lua/5.1/nn/Container.lua:67: 
In 2 module of nn.Sequential:
/Users/username/torch/install/share/lua/5.1/nn/THNN.lua:110: bad argument #2 to 'v' (3D or 4D input tensor expected but got: [199 x 784] at /tmp/luarocks_nn-scm-1-2325/nn/lib/THNN/generic/SpatialConvolutionMM.c:33)
stack traceback:
    [C]: in function 'v'
    /Users/username/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'SpatialConvolutionMM_updateOutput'
    ...go/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:79: in function <...go/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:76>
    [C]: in function 'xpcall'
    ...s/username/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    .../username/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    [string "step = function(batch_size)..."]:27: in function 'opfunc'
    /Users/username/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
    [string "step = function(batch_size)..."]:33: in function 'step'
    [string "do..."]:6: in main chunk
    [C]: in function 'xpcall'
    ...rs/username/torch/install/share/lua/5.1/itorch/main.lua:210: in function <...rs/username/torch/install/share/lua/5.1/itorch/main.lua:174>
    ...rs/username/torch/install/share/lua/5.1/lzmq/poller.lua:75: in function 'poll'
    ...username/torch/install/share/lua/5.1/lzmq/impl/loop.lua:307: in function 'poll'
    ...username/torch/install/share/lua/5.1/lzmq/impl/loop.lua:325: in function 'sleep_ex'
    ...username/torch/install/share/lua/5.1/lzmq/impl/loop.lua:370: in function 'start'
    ...rs/username/torch/install/share/lua/5.1/itorch/main.lua:389: in main chunk
    [C]: in function 'require'
    (command line):1: in main chunk
    [C]: at 0x0100c03350

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
    [C]: in function 'error'
    ...s/username/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
    .../username/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    [string "step = function(batch_size)..."]:27: in function 'opfunc'
    /Users/username/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
    [string "step = function(batch_size)..."]:33: in function 'step'
    [string "do..."]:6: in main chunk
    [C]: in function 'xpcall'
    ...rs/username/torch/install/share/lua/5.1/itorch/main.lua:210: in function <...rs/username/torch/install/share/lua/5.1/itorch/main.lua:174>
    ...rs/username/torch/install/share/lua/5.1/lzmq/poller.lua:75: in function 'poll'
    ...username/torch/install/share/lua/5.1/lzmq/impl/loop.lua:307: in function 'poll'
    ...username/torch/install/share/lua/5.1/lzmq/impl/loop.lua:325: in function 'sleep_ex'
    ...username/torch/install/share/lua/5.1/lzmq/impl/loop.lua:370: in function 'start'
    ...rs/username/torch/install/share/lua/5.1/itorch/main.lua:389: in main chunk
    [C]: in function 'require'
    (command line):1: in main chunk
    [C]: at 0x0100c03350

我尝试根据此处的解决方案(Torch mnist simple)进行修复,在输入数据上添加:view(1, 28, 28)或从inputs = torch.Tensor(size, 28, 28)更改为inputs = torch.Tensor(size, 1, 28, 28),但是,没有解决问题。

我不明白我还能尝试什么,而且我还不知道如何调试这种类型的错误。谢谢你的帮助。

1 个答案:

答案 0 :(得分:0)

在模型中发现错误,不需要这句话:model:add(nn.Reshape(28*28))。在注释掉之后,又出现了另一个错误。

……./torch/install/share/lua/5.1/nn/THNN.lua:110: Need input of dimension 3 and input.size[0] == 1 but got input to be of shape: [199 x 28 x 28] at /tmp/luarocks_nn-scm-1-2325/nn/lib/THNN/generic/SpatialConvo‌​lutionMM.c:47

然后,发现Torch的目标不接受0进行分类,因此需要10来代替。以下有帮助。

https://github.com/torch/nn/issues/471

Torch7 ClassNLLCriterion()

http://nn.readthedocs.io/en/rtd/criterion/