我想我用PyTorch简化了简单的神经网络,因为在CUDA上运行速度比在CPU中慢得多,请问您能找到错误吗?
之类的使用功能 def backward(ctx, input):
return backward_sigm(ctx, input)
似乎对性能没有真正的影响
import torch
import torch.nn as nn
import torch.nn.functional as f
dname = 'cuda:0'
dname = 'cpu'
device = torch.device(dname)
print(torch.version.cuda)
def forward_sigm(ctx, input):
sigm = 1 / (1 + torch.exp(-input))
ctx.save_for_backward(sigm)
return sigm
def forward_step(ctx, input):
return torch.tensor(input > 0.5, dtype = torch.float32, device = device)
def backward_sigm(ctx, grad_output):
sigm, = ctx.saved_tensors
return grad_output * sigm * (1-sigm)
def backward_step(ctx, grad_output):
return grad_output
class StepAF(torch.autograd.Function):
@staticmethod
def forward(ctx, input):
return forward_sigm(ctx, input)
@staticmethod
def backward(ctx, input):
return backward_sigm(ctx, input)
#else return grad_output
class StepNN(torch.nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(StepNN, self).__init__()
self.linear1 = torch.nn.Linear(input_size, hidden_size)
#self.linear1.cuda()
self.linear2 = torch.nn.Linear(hidden_size, output_size)
#self.linear2.cuda()
#self.StepAF = StepAF.apply
def forward(self,x):
h_line_1 = self.linear1(x)
h_thrash_1 = StepAF.apply(h_line_1)
h_line_2 = self.linear2(h_thrash_1)
output = StepAF.apply(h_line_2)
return output
inputs = torch.tensor( [[1,0,1,0],[1,0,0,1],[0,1,0,1],[0,1,1,0],[1,0,0,0],[0,0,0,1],[1,1,0,1],[0,1,0,0],], dtype = torch.float32, device = device)
expected = torch.tensor( [[1,0,0],[1,0,0],[0,1,0],[0,1,0],[1,0,0],[0,0,1],[0,1,0],[0,0,1],], dtype = torch.float32, device = device)
nn = StepNN(4,8,3)
#print(*(x for x in nn.parameters()))
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(nn.parameters(), lr=1e-3)
steps = 50000
print_steps = steps // 20
good_loss = 1e-5
for t in range(steps):
output = nn(inputs)
loss = criterion(output, expected)
if t % print_steps == 0:
print('step ',t, ', loss :' , loss.item())
if loss < good_loss:
print('step ',t, ', loss :' , loss.item())
break
optimizer.zero_grad()
loss.backward()
optimizer.step()
test = torch.tensor( [[0,1,0,1],[0,1,1,0],[1,0,1,0],[1,1,0,1],], dtype = torch.float32, device=device)
print(nn(test))
答案 0 :(得分:1)
除非您有足够大的数据,否则使用GPU时不会看到任何性能改进。问题在于GPU使用并行处理,因此,除非您有大量数据,否则CPU可以像GPU一样快地处理样本。
据我在您的示例中看到的,您正在使用8个大小为(4,1)的样本。我可以想象,当拥有数百或数千个样本时,您会发现GPU的性能有所提高。在您的情况下,样本大小为(4,1),隐藏层大小为8,因此CPU可以相当快地执行计算。
网上有很多使用MNIST数据的示例笔记本(其中包含约60000张图像用于训练),因此您可以在Google Colab中加载一张,然后尝试在CPU上然后在GPU上进行训练,并观察训练时间。例如,您可以尝试this link。它使用TensorFlow代替PyTorch,但可以使您对GPU的性能有所了解。
注意:如果您以前从未使用过Google Colab,则需要在顶部的运行时菜单中更改运行时类型(对于CPU,GPU不适用,对于GPU,GPU不适用)。
此外,我将在这里将笔记本的结果发布到这里(看看括号中提到的时间,如果运行它,您可以直接看到它运行的速度):
在CPU上:
INFO:tensorflow:loss = 294.3736, step = 1
INFO:tensorflow:loss = 28.285727, step = 101 (23.769 sec)
INFO:tensorflow:loss = 23.518856, step = 201 (24.128 sec)
在GPU上:
INFO:tensorflow:loss = 295.08328, step = 0
INFO:tensorflow:loss = 47.37291, step = 100 (4.709 sec)
INFO:tensorflow:loss = 23.31364, step = 200 (4.581 sec)
INFO:tensorflow:loss = 9.980572, step = 300 (4.572 sec)
INFO:tensorflow:loss = 17.769928, step = 400 (4.560 sec)
INFO:tensorflow:loss = 16.345463, step = 500 (4.531 sec)