Question

我想使用库线程（或者并行）来将数据加载/预处理到队列中，但我不完全确定它是如何工作的。总之;

加载数据（张量），预处理张量（这需要时间，因此我在这里）并将它们放入队列中。我希望有尽可能多的线程这样做，以便模型不等待或不等待很长时间。
对于队列顶部的张量，提取它并将其转发通过模型并将其从队列中删除。

我不太了解https://github.com/torch/threads中的示例。关于我将数据加载到队列和火车的位置的提示或示例将是很好的。

EDIT 14/03/2016

在此示例中＆＃34; https://github.com/torch/threads/blob/master/test/test-low-level.lua＆＃34;使用低级线程，有谁知道如何从这些线程中提取数据到主线程？

Answer 1

看看这个多线程数据提供者： https://github.com/soumith/dcgan.torch/blob/master/data/data.lua

它在线程中运行此文件： https://github.com/soumith/dcgan.torch/blob/master/data/data.lua#L18

在这里打电话： https://github.com/soumith/dcgan.torch/blob/master/data/data.lua#L30-L43

然后，如果要将作业排入线程，则提供两个函数： https://github.com/soumith/dcgan.torch/blob/master/data/data.lua#L84 第一个在线程内部运行，第二个在第一个完成后在主线程中运行。

希望这会让它更加清晰。

Answer 2

如果上一个答案中的Soumith示例不是很容易使用，我建议您从头开始构建自己的管道。我在这里提供了两个同步线程的示例：一个用于写入数据，另一个用于读取数据：

local t = require 'threads'
t.Threads.serialization('threads.sharedserialize')
local tds = require 'tds'
local dict = tds.Hash()  -- only local variables work here, and only tables or tds.Hash()
dict[1] = torch.zeros(4)

local m1 = t.Mutex()
local m2 = t.Mutex()
local m1id  = m1:id()
local m2id  = m2:id()

m1:lock()

local pool = t.Threads(
  1,
  function(threadIdx)
  end
)

pool:addjob(
  function()
    local t = require 'threads'
    local m1 = t.Mutex(m1id)
    local m2 = t.Mutex(m2id)

    while true do
      m2:lock()
      dict[1] = torch.randn(4)
      m1:unlock()

      print ('W ===> ')
      print(dict[1])
      collectgarbage()
      collectgarbage()      
    end

    return __threadid
  end,
  function(id)
  end
)

-- Code executing on master:
local a = 1
while true do
  m1:lock()
  a = dict[1]
  m2:unlock()

  print('R --> ')
  print(a)
end

Torch - 多线程将张量加载到队列中以进行训练

2 个答案: