我使用python实现了一个迷你批次随机梯度下降。这里的模型是SVM和softmax模型。以下是优化器的内容:
loss, grad = loss(X_batch, y_batch, reg)
W -= learning_rate * grad
对于学习率和正则化参数的某些值,它完美地工作(它通过梯度检查)
iteration 0 / 1500: loss 794.507350
iteration 100 / 1500: loss 29.568095
iteration 200 / 1500: loss 33.458495
iteration 300 / 1500: loss 18.304720
iteration 400 / 1500: loss 35.882488
iteration 500 / 1500: loss 21.490048
iteration 600 / 1500: loss 25.785373
iteration 700 / 1500: loss 22.813580
iteration 800 / 1500: loss 25.500039
iteration 900 / 1500: loss 24.819413
iteration 1000 / 1500: loss 15.608862
iteration 1100 / 1500: loss 22.731321
iteration 1200 / 1500: loss 26.022119
iteration 1300 / 1500: loss 20.588262
iteration 1400 / 1500: loss 23.737331
That took 16.041453s
然而,如果我设置较小的学习率/较大的正则化系数,它很容易变为无穷大。
iteration 0 / 1500: loss 802.769833
iteration 100 / 1500: loss 402010684912951517714160259531836227584.000000
iteration 200 / 1500: loss 66449146544223774690669308194838812744571030707457444354059618517426110464.000000
iteration 300 / 1500: loss 10983511737783225854597341204492232891393029195591698817237023075664722703077635057235569132488177530617987072.000000
iteration 400 / 1500: loss 1815486524175816065194879215518721785434014355721054994732056447366189069461661764367553711338715612291351514721201526014636467439510904624906240.000000
iteration 500 / 1500: loss 300085382357794494911213874061735327359248402577239262036553372454999306473258608281908717386082743259637422311405675655718344736088310092775477346890221645805861568848900053270528.000000
iteration 600 / 1500: loss 49601710343569850904008263569832035647932127145181615050392357841588796262557064950546589884591619227102632956834873550818712373623620724060742653598312083408633957433671568944564242236660656086159625258558214373376.000000
iteration 700 / 1500: loss 8198765463603785409686242099308047610060840590000243139990894837982240050293849610436998843183936014375298887308342856825809426287886468401376324562159947921470840145036302068723165705243202367572424942890685882166121392632827736787223859487240617984.000000
iteration 800 / 1500: loss 1355190267867357287705764546778522627576836845623844275975557230456460933397253991581163373195663111196172764626006669963061476094064927997512717916334132304539774742926463546477992463418428065245444868204942968447224877690702413263146519713454691766809999132152180155000809790950604800.000000
RuntimeWarning: overflow encountered in double_scalars
这是我的代码中的错误,还是一般的优化问题?我该如何避免这个问题?