如何设计elclidean丢失的dlib dnn层?

时间:2016-11-20 01:46:56

标签: c++ conv-neural-network dlib

在论文“Accurate Image Super-Resolution Using Very Deep Convolutional Networks”中,他们将损失函数定义为[high resolution image - low resolution image] == [true residual][network output] == [predict residual]的封闭距离。损失函数的目的是最小化真实残差和预测残差之间的损失。

更确切地说,目标是最小化((r-f(x))^2)/2 r是真实残差,f(x)是预测残差。

以下是损失函数的实现

double compute_loss_value_and_gradient (
        const tensor& input_tensor,
        const_label_iterator truth,
        SUBNET& sub
        ) const
{
    DLIB_CASSERT(input_tensor.num_samples() != 0);

    auto const &output_tensor = sub.get_output();
    tensor& grad = sub.get_gradient_input();
    DLIB_CASSERT(input_tensor.num_samples() == output_tensor.num_samples());
    DLIB_CASSERT(grad.nc() == input_tensor.nc() && grad.nr() == input_tensor.nr());
    DLIB_CASSERT(input_tensor.nr() == output_tensor.nr() &&
                 input_tensor.nc() == output_tensor.nc() &&
                 input_tensor.k() == output_tensor.k());

    double loss = 0;
    output_label_type update_gradient(grad.nr(), grad.nc());
    update_gradient = 0;
    for(long i = 0; i != output_tensor.num_samples(); ++i){
        auto const predict_residual = image_plane(output_tensor, i);
        DLIB_CASSERT(predict_residual.nr() == truth->nr() &&
                     predict_residual.nc() == truth->nc());
        //I set the label(truth) as
        //[high_res_img - low_res_img] == [truth residual_img].
        auto const error_residual = *truth - predict_residual;
        auto const eu_dist = sum(pointwise_multiply(error_residual, error_residual));
        //after derivative, gradient of euclidean distance are
        //input - reference, since output_tensor is a mini batch
        //I sum all of them together
        update_gradient += error_residual;
        loss += eu_dist;
        ++truth;
    }
    //I take the average as the gradient value
    update_gradient /= output_tensor.num_samples();
    auto *grad_ptr = grad.host_write_only();
    std::copy(&update_gradient(0), &update_gradient(0) + update_gradient.size(),
              grad_ptr);

    return loss / 2.0 / output_tensor.num_samples();
}

然而这种损失功能不能很好地工作(损失在8.88~8.9左右并且似乎没有收敛),我想这是因为梯度更新有bug。

如果您需要完整的源代码,请将其放在pastebin。网络架构是

template<typename SUBNET>
using con_net = relu<bn_con<con<64,3,3,1,1, SUBNET>>>;

using net_00 = loss_euclidean_dist<
relu<bn_con<con<1,3,3,1,1,
repeat<3, con_net,
con_net<
input<matrix<float>>
>>>>>>;

本文采用20个卷积层实现网络,利用梯度夹避免梯度爆炸/消失问题,而采用5卷积网络进行实验,优选批量归一化,避免梯度爆炸/消失。

教练和优化者的参数

    dnn_trainer<vsdr::net_00, adam> trainer(net, adam(0.0001, 0.9, 0.999));
    trainer.set_learning_rate(0.001);
    trainer.set_min_learning_rate(1e-7);
    trainer.set_mini_batch_size(128);

编辑0:训练时从未调用过函数to_label

编辑1:我不对距离应用sqrt,因为这可以简化衍生

编辑2:发现欧几里德丢失的问题,我更新渐变的方式是错误的。

double compute_loss_value_and_gradient (
            const tensor& input_tensor,
            const_label_iterator truth,
            SUBNET& sub
            ) const
    {
        DLIB_CASSERT(input_tensor.num_samples() != 0);

        auto const &output_tensor = sub.get_output();
        tensor& grad = sub.get_gradient_input();
        DLIB_CASSERT(input_tensor.num_samples() == output_tensor.num_samples());
        DLIB_CASSERT(grad.nc() == input_tensor.nc() && grad.nr() == input_tensor.nr());
        DLIB_CASSERT(input_tensor.nr() == output_tensor.nr() &&
                     input_tensor.nc() == output_tensor.nc() &&
                     input_tensor.k() == output_tensor.k());

        double loss = 0;
        for(long i = 0; i != output_tensor.num_samples(); ++i){

            auto const error_residual = image_plane(output_tensor, i) - *truth;
            auto const eu_dist = sum(pointwise_multiply(error_residual, error_residual));
            grad.set_sample(i, error_residual);
            loss += eu_dist;
            ++truth;
        }

        return loss / 2.0 / output_tensor.num_samples() / output_tensor.nc()/ output_tensor.nr();
    }

另一个问题是网络架构,它不应该在最后一个转换层之上添加relu。

using net_00 = loss_euclidean_dist<
bn_con<con<1,3,3,1,1,
repeat<3, con_net,
con_net<
input<matrix<float>>
>>>>>;

现在训练过程看起来好多了,但是预测的残差是坏的,扩大的psnr甚至比双三次更差。我尝试删除批量标准化并使用非常低的学习率(1e-5)以避免梯度爆炸,但这也不起作用。

0 个答案:

没有答案