查看code,逻辑是:
'''grads depend on total_loss'''
grads = optimizer.compute_gradients(
total_loss,
variables_to_train,
gate_gradients=gate_gradients,
aggregation_method=aggregation_method,
colocate_gradients_with_ops=colocate_gradients_with_ops)
'''grad_updates depend on grads, so it also depend on total_loss'''
grad_updates = optimizer.apply_gradients(grads, global_step=global_step)
'''but total_loss depend on grad_updates'''
train_op = control_flow_ops.with_dependencies([grad_updates], total_loss)
看一下注释,total_loss依赖于grad_updates和在total_loss上删除的grad_updates,会发生什么?