我有一个简单的问题。我试图理解为什么gpu(cuda)和cpu给出的网络响应存在很大差异。这是一个最小的例子:
require 'torch'
require 'nn'
require 'cunn'
require 'paths'
-- a small convnet
net = nn.Sequential()
net:add(nn.SpatialConvolution(3,16, 3,3))
net:add(nn.SpatialConvolution(16,8, 3,3))
net:add(nn.SpatialConvolution(8,1, 3,3))
-- randomize weights
local w = net:getParameters()
w:copy(torch.Tensor(w:nElement()):uniform(-1000,1000))
-- random input
x = torch.Tensor(3, 10, 10):uniform(-1,1)
-- network on gpu
net:cuda()
y = net:forward(x:cuda())
print(y)
-- network on cpu
y2 = net:clone():double():forward(x)
print(y2)
-- check difference (typically ~10000)
print("Mean Abs. Diff:")
print(torch.abs(y2-y:double()):sum()/y2:nElement())
我在这里做错了什么,或者CPU / GPU计算之间存在一些预期的差异?
答案 0 :(得分:0)
事实证明,即使平均绝对差异很大,百分比差异也很小(约为1e-5%):
print("Mean Abs. % Diff:")
print(torch.abs(y2-y:double()):cdiv(torch.abs(y2)):sum() / y2:nElement())
平均绝对差异。由于cuda处理浮点精度与cpu相比有何不同?