deeplearning4j中的消失/爆炸梯度

时间:2017-03-05 18:45:57

标签: deep-learning deeplearning4j

如何检查deeplearning4j中是否有消失/爆炸梯度,更具体地说是对于递归神经网络?我的意思是,要查找哪些参数以及我们应该调用哪些方法来获取这些参数的值?

1 个答案:

答案 0 :(得分:1)

如上所述,您应该查看GUI,简介here

DL4J GUI:“概述”标签->“更新:参数比率”

  

参数更新的比率专门指平均值的比率   这些值的大小(即   log10(平均值(abs(更新))/平均值(abs(参数)))

因此,明显高或低的值可能表明梯度爆炸/消失。

以编程方式

在每次迭代结束时,渐变都存储在ComputationalGraph和MultiLayerNetwork的渐变字段中。可以通过公共gradient()方法访问此方法(此方法不会更改状态,这是一个简单的getter),因此您可以在代码中分析渐变。

下面是一个小的代码段,可输出每个变量的梯度的最小值,平均值,最大值以及最小值的log10(幅值):

    StringBuilder gradSummary = new StringBuilder("--- Gradients ---\n");
    net.gradient().gradientForVariable().forEach((var, grad) -> {
        Number min = grad.aminNumber();
        Number max = grad.amaxNumber();
        Number mean = grad.ameanNumber();
        int order = (int) Math.log10(min.doubleValue());
        gradSummary.append(var).append(": ")
            .append(min).append(",")
            .append(mean).append(",")
            .append(max).append(",")
            .append("magnitude: ").append(order).append('\n');
    });
    gradSummary.append("-----------------");
    log.info(gradSummary.toString());

它会产生类似于以下的输出(注意变量是根据图层名称命名的):

2019-01-05 15:26:12 INFO  --- Gradients ---
lstm-1_W: 4.1305625586574024E-11,2.102349571941886E-5,5.235217977315187E-4, magnitude: -10
lstm-1_RW: 6.30961949354969E-11,1.7203132301801816E-5,1.335109118372202E-4, magnitude: -10
lstm-1_b: 2.9782620813989524E-10,3.226526814614772E-6,3.882131932186894E-5, magnitude: -9
lstm-2_W: 2.340811988688074E-10,2.496814886399079E-5,7.095998153090477E-4, magnitude: -9
lstm-2_RW: 8.640199666842818E-11,4.6048542571952567E-5,0.0015051497612148523, magnitude: -10
lstm-2_b: 6.85293555235944E-9,3.012867455254309E-5,4.262796137481928E-4, magnitude: -8
lstm-3_W: 1.141415850725025E-10,5.7301283959532157E-5,0.0024848710745573044, magnitude: -9
lstm-3_RW: 2.446540747769177E-10,3.4060700272675604E-5,0.002297096885740757, magnitude: -9
lstm-3_b: 1.5003001507807312E-8,2.131067230948247E-5,2.356997865717858E-4, magnitude: -7
norm-1_gamma: 4.6524661456714966E-8,2.8755117455148138E-5,1.543344114907086E-4, magnitude: -7
norm-1_beta: 5.754080234510184E-7,1.0409040987724438E-4,3.460813604760915E-4, magnitude: -6
norm-1_mean: 8.82148754044465E-7,0.0033756729681044817,0.048742543905973434, magnitude: -6
norm-1_var: 3.0532873451782905E-10,2.6078732844325714E-6,1.6723810404073447E-4, magnitude: -9
dense-1_W: 3.8744474295526743E-10,5.491946285474114E-5,6.59565266687423E-4, magnitude: -9
dense-1_b: 4.4111070565122645E-6,1.4454024494625628E-4,4.0868428186513484E-4, magnitude: -5
norm-2_gamma: 2.477656607879908E-6,9.73446512944065E-5,2.708708052523434E-4, magnitude: -5
norm-2_beta: 3.106115855189273E-6,4.934889730066061E-4,0.0012065295595675707, magnitude: -5
norm-2_mean: 2.7818930902867578E-5,0.004300051834434271,0.01411475520581007, magnitude: -4
norm-2_var: 1.806318869057577E-5,0.007471780758351088,0.020012110471725464, magnitude: -4
output_W: 7.830021786503494E-8,1.4970696065574884E-4,4.896917380392551E-4, magnitude: -7
output_b: 3.1583107193000615E-4,6.765704602003098E-4,0.0011031415779143572, magnitude: -3
-----------------

您甚至可以将此代码包装在迭代侦听器周围,并每N次迭代输出一次,以帮助您适应训练过程。