Question

我在他们的网站上使用过Torch7示例的逻辑回归，但代码没有使用GPU： https://github.com/torch/demos/blob/master/logistic-regression/example-logistic-regression.lua

我的代码几乎完全相同。但不同之处在于我也试图使用我的GPU。我有CUDA和＆＃39; cudnn＆＃39;库安装正确。我尝试过使用＆＃39; cudnn＆＃39;在一个非常简单的例子上，它工作： https://github.com/soumith/cudnn.torch

但是当我尝试转换Logistic回归模型时，它无法正常工作。我并不真正理解错误，但它可以很好地转换模型，但在使用SGD进行优化时会抛出错误。我是Torch的新手，所以一些帮助真的很棒！以下是我的代码片段：

require 'torch'
require 'math'
require 'svm'
require 'nn'
require 'optim'
require 'cudnn'

TRAIN_NAME = "sample_train.txt"
TEST_NAME = "sample_test.txt"


-- i have:
num_samples = 10000
num_features = 500
- dataset_inputs: feature tensor (dimension: num_samples X num_features)
- dataset_outputs: labels tensor (dim: num_samples)
num_labels = 2

-- create the model
linLayer = nn.Linear(num_features, num_labels)
softMaxLayer = nn.LogSoftMax()  -- the input and output are a single tensor
model = nn.Sequential()
model:add(linLayer)
model:add(softMaxLayer)
cudnn.convert(model, cudnn)  -- converts the model
print(model)

-- loss function to be minimized: negative log-likelihood
criterion = nn.ClassNLLCriterion()

----------------------------------------------------------------------
-- Train the model (Using SGD)

x, dl_dx = model:getParameters()

feval = function(x_new)
   if x ~= x_new then
      x:copy(x_new)
   end

   _nidx_ = (_nidx_ or 0) + 1
   if _nidx_ > (#dataset_inputs)[1] then _nidx_ = 1 end

   local inputs = dataset_inputs[_nidx_]
   local target = dataset_outputs[_nidx_]

   dl_dx:zero()

   -- evaluate the loss function and its derivative wrt x, for that sample
   local loss_x = criterion:forward(model:forward(inputs), target)
   model:backward(inputs, criterion:backward(model.output, target))

   -- return loss(x) and dloss/dx
   return loss_x, dl_dx
end

-- Parameters train the model using SGD
sgd_params = {
   learningRate = 1e-3,
   learningRateDecay = 1e-4,
   weightDecay = 0,
   momentum = 0
}


epochs = 1e2  -- number of cycles/iterations over our training data

print('')
print('============================================================')
print('Training with SGD')
print('')

for i = 1,epochs do

   -- this variable is used to estimate the average loss
   current_loss = 0

   -- an epoch is a full loop over our training data
   for i = 1,(#dataset_inputs)[1] do

      _,fs = optim.sgd(feval,x,sgd_params) -- PROBLEM!! : this function call produces the error

      current_loss = current_loss + fs[1]
   end

   -- report average error on epoch
   current_loss = current_loss / (#dataset_inputs)[1]
   print('epoch = ' .. i .. ' of ' .. epochs .. ' current loss = ' .. current_loss)

end


-- Then I will use the "model" to predict on test samples


print("---- DONE -----")

这是我得到的错误：

============================================================
Training with SGD

/home/s43moham/torch/install/bin/luajit: /home/s43moham/torch/install/share/lua/5.1/nn/Container.lua:67:
In 2 module of nn.Sequential:
/home/s43moham/torch/install/share/lua/5.1/cudnn/init.lua:125: assertion failed!
stack traceback:
[C]: in function 'assert'
/home/s43moham/torch/install/share/lua/5.1/cudnn/init.lua:125: in function 'toDescriptor'
...ham/torch/install/share/lua/5.1/cudnn/SpatialSoftMax.lua:39: in function 'createIODescriptors'
...ham/torch/install/share/lua/5.1/cudnn/SpatialSoftMax.lua:57: in function <...ham/torch/install/share/lua/5.1/cudnn/SpatialSoftMax.lua:56>
[C]: in function 'xpcall'
/home/s43moham/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...e/s43moham/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
lr.lua:104: in function 'opfunc'
/home/s43moham/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
lr.lua:142: in main chunk
[C]: in function 'dofile'
...oham/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/home/s43moham/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
...e/s43moham/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
lr.lua:104: in function 'opfunc'
/home/s43moham/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
lr.lua:142: in main chunk
[C]: in function 'dofile'
...oham/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

Answer 1

我修改了你的代码，它似乎工作。诀窍是确保在继续训练之前将数据集，模型和标准全部转换为cuda（）。

require 'torch'
require 'nn'
require 'optim'
require 'cutorch'
require 'cunn'
require 'cudnn'

-- dataset_inputs: feature tensor (dimension: num_samples X num_features)
-- dataset_outputs: labels tensor (dim: num_samples)
num_samples = 10000
num_features = 500
num_labels = 2

dataset_inputs = torch.rand(num_samples, num_features):cuda()
dataset_outputs = torch.Tensor(num_samples):random(1, 2):cuda()

-- create the model
model = nn.Sequential()
model:add(nn.Linear(num_features, num_labels))
model:add(nn.LogSoftMax())

-- convert model to cuda then cudnn
model:cuda()
cudnn.convert(model, cudnn)
print(model)

-- loss function to be minimized: negative log-likelihood
criterion = nn.ClassNLLCriterion():cuda()

----------------------------------------------------------------------
-- Train the model (Using SGD)

x, dl_dx = model:getParameters()

feval = function(x_new)
   if x ~= x_new then
      x:copy(x_new)
   end

   _nidx_ = (_nidx_ or 0) + 1
   if _nidx_ > (#dataset_inputs)[1] then _nidx_ = 1 end

   local inputs = dataset_inputs[_nidx_]
   local target = dataset_outputs[_nidx_]

   dl_dx:zero()

   -- evaluate the loss function and its derivative wrt x, for that sample
   local loss_x = criterion:forward(model:forward(inputs), target)
   model:backward(inputs, criterion:backward(model.output, target))

   -- return loss(x) and dloss/dx
   return loss_x, dl_dx
end

-- Parameters train the model using SGD
sgd_params = {
   learningRate = 1e-3,
   learningRateDecay = 1e-4,
   weightDecay = 0,
   momentum = 0
}

epochs = 1e2  -- number of cycles/iterations over our training data

print('')
print('============================================================')
print('Training with SGD')
print('')

for i = 1,epochs do
   -- this variable is used to estimate the average loss
   current_loss = 0
   -- an epoch is a full loop over our training data
   for i = 1, (#dataset_inputs)[1] do
      _,fs = optim.sgd(feval, x, sgd_params)
      current_loss = current_loss + fs[1]
   end

   -- report average error on epoch
   current_loss = current_loss / (#dataset_inputs)[1]
   print('epoch = ' .. i .. ' of ' .. epochs .. ' current loss = ' .. current_loss)
end

-- Then I will use the "model" to predict on test samples
print("---- DONE -----")

Answer 2

确保将输入转换为cuda数据类型。尝试交换

local loss_x = criterion:forward(model:forward(inputs), target)

与

local loss_x = criterion:forward(model:forward(inputs:cuda()), target)

但实际上你应该在运行模型之前转换所有输入，这样它就不会动态执行。 I.E.某处放inputs = inputs:cuda()

Answer 3

如果您仍在寻找逻辑回归的GPU加速，我们（IBM）有一个名为Snap ML的库就可以了。目前与Scikit-Learn集成。 TensorFlow集成即将推出：https://medium.com/@sumitg_16893/ibm-research-cracks-code-on-accelerating-key-machine-learning-algorithms-647b5031b420

萨米特 IBM

Torch7 - 使用GPU的逻辑回归 - CUDA / cudnn

3 个答案: