在论文“Accurate Image Super-Resolution Using Very Deep Convolutional Networks”中,他们将损失函数定义为[high resolution image - low resolution image] == [true residual]
和[network output] == [predict residual]
的封闭距离。损失函数的目的是最小化真实残差和预测残差之间的损失。
更确切地说,目标是最小化((r-f(x))^2)/2
r是真实残差,f(x)是预测残差。
以下是损失函数的实现
double compute_loss_value_and_gradient (
const tensor& input_tensor,
const_label_iterator truth,
SUBNET& sub
) const
{
DLIB_CASSERT(input_tensor.num_samples() != 0);
auto const &output_tensor = sub.get_output();
tensor& grad = sub.get_gradient_input();
DLIB_CASSERT(input_tensor.num_samples() == output_tensor.num_samples());
DLIB_CASSERT(grad.nc() == input_tensor.nc() && grad.nr() == input_tensor.nr());
DLIB_CASSERT(input_tensor.nr() == output_tensor.nr() &&
input_tensor.nc() == output_tensor.nc() &&
input_tensor.k() == output_tensor.k());
double loss = 0;
output_label_type update_gradient(grad.nr(), grad.nc());
update_gradient = 0;
for(long i = 0; i != output_tensor.num_samples(); ++i){
auto const predict_residual = image_plane(output_tensor, i);
DLIB_CASSERT(predict_residual.nr() == truth->nr() &&
predict_residual.nc() == truth->nc());
//I set the label(truth) as
//[high_res_img - low_res_img] == [truth residual_img].
auto const error_residual = *truth - predict_residual;
auto const eu_dist = sum(pointwise_multiply(error_residual, error_residual));
//after derivative, gradient of euclidean distance are
//input - reference, since output_tensor is a mini batch
//I sum all of them together
update_gradient += error_residual;
loss += eu_dist;
++truth;
}
//I take the average as the gradient value
update_gradient /= output_tensor.num_samples();
auto *grad_ptr = grad.host_write_only();
std::copy(&update_gradient(0), &update_gradient(0) + update_gradient.size(),
grad_ptr);
return loss / 2.0 / output_tensor.num_samples();
}
然而这种损失功能不能很好地工作(损失在8.88~8.9左右并且似乎没有收敛),我想这是因为梯度更新有bug。
如果您需要完整的源代码,请将其放在pastebin。网络架构是
template<typename SUBNET>
using con_net = relu<bn_con<con<64,3,3,1,1, SUBNET>>>;
using net_00 = loss_euclidean_dist<
relu<bn_con<con<1,3,3,1,1,
repeat<3, con_net,
con_net<
input<matrix<float>>
>>>>>>;
本文采用20个卷积层实现网络,利用梯度夹避免梯度爆炸/消失问题,而采用5卷积网络进行实验,优选批量归一化,避免梯度爆炸/消失。
教练和优化者的参数 dnn_trainer<vsdr::net_00, adam> trainer(net, adam(0.0001, 0.9, 0.999));
trainer.set_learning_rate(0.001);
trainer.set_min_learning_rate(1e-7);
trainer.set_mini_batch_size(128);
编辑0:训练时从未调用过函数to_label
。
编辑1:我不对距离应用sqrt,因为这可以简化衍生
编辑2:发现欧几里德丢失的问题,我更新渐变的方式是错误的。
double compute_loss_value_and_gradient (
const tensor& input_tensor,
const_label_iterator truth,
SUBNET& sub
) const
{
DLIB_CASSERT(input_tensor.num_samples() != 0);
auto const &output_tensor = sub.get_output();
tensor& grad = sub.get_gradient_input();
DLIB_CASSERT(input_tensor.num_samples() == output_tensor.num_samples());
DLIB_CASSERT(grad.nc() == input_tensor.nc() && grad.nr() == input_tensor.nr());
DLIB_CASSERT(input_tensor.nr() == output_tensor.nr() &&
input_tensor.nc() == output_tensor.nc() &&
input_tensor.k() == output_tensor.k());
double loss = 0;
for(long i = 0; i != output_tensor.num_samples(); ++i){
auto const error_residual = image_plane(output_tensor, i) - *truth;
auto const eu_dist = sum(pointwise_multiply(error_residual, error_residual));
grad.set_sample(i, error_residual);
loss += eu_dist;
++truth;
}
return loss / 2.0 / output_tensor.num_samples() / output_tensor.nc()/ output_tensor.nr();
}
另一个问题是网络架构,它不应该在最后一个转换层之上添加relu。
using net_00 = loss_euclidean_dist<
bn_con<con<1,3,3,1,1,
repeat<3, con_net,
con_net<
input<matrix<float>>
>>>>>;
现在训练过程看起来好多了,但是预测的残差是坏的,扩大的psnr甚至比双三次更差。我尝试删除批量标准化并使用非常低的学习率(1e-5)以避免梯度爆炸,但这也不起作用。