我需要在每次迭代中对梯度求和,然后将这些梯度转移到另一个过程以重现学习到的网络。
键码如下所示。方法1:
class Net(nn.Module):
def __init__(self, state_dim, action_dim, max_action):
super(Actor, self).__init__()
self.l1 = nn.Linear(state_dim, 40)
self.l2 = nn.Linear(40, 30)
self.l3 = nn.Linear(30, action_dim)
self.max_action = max_action
def forward(self, x):
x = F.relu(self.l1(x))
x = F.relu(self.l2(x))
x = self.max_action * torch.tanh(self.l3(x))
return x
def train(batches,state_dim, action_dim, max_action):
actor = Net(state_dim, action_dim, max_action)
critic = Net(state_dim, action_dim, max_action)
for i in range(1000):
...
#Compute critic loss
critic_loss = F.mse_loss(current_Q, target_Q)
# Optimize the critic
critic_optimizer.zero_grad()
critic_loss.backward()
critic_optimizer.step()
# Compute actor loss
actor_loss = -critic(state,actor(state)).mean()
# Optimize the actor
actor_optimizer.zero_grad()
actor_loss.backward()
actor_optimizer.step()
return net
...
net = train(batches,state_dim, action_dim, max_action)
方法2:
...
def train(batches,state_dim, action_dim, max_action):
net = Net(state_dim, action_dim, max_action)
for i in range(1000):
...
# Optimize the critic
critic_optimizer.zero_grad()
critic_loss.backward()
sum_grads(critic) # sum the gradient in critic
for g,p in zip(sum_grads,net.parameters()):
p.grad = torch.from_numpy(g)
net_optimizer.step()
return net
...
net = train(batches,state_dim, action_dim, max_action)
我希望方法1和方法2可以学习相同的净参数,但事实并非如此。所以我的问题是为什么?以及如何使其工作?预先谢谢你。
答案 0 :(得分:0)
不需要显式地累加分数,您只需要一次zero_grad,向前和向后应用,而无需循环,只需调用一次
model.zero_grad() # Reset gradients tensors
for i, (inputs, labels) in enumerate(training_set):
predictions = model(inputs) # Forward pass
loss = loss_function(predictions, labels) # Compute loss function
loss = loss / accumulation_steps # Normalize our loss (if averaged)
loss.backward() # Backward pass
if (i+1) % accumulation_steps == 0: # Wait for several backward steps
optimizer.step() # Now we can do an optimizer step
model.zero_grad() # Reset gradients tensors