如何在python theano中编码adagrad

时间:2015-03-31 09:37:57

标签: python gradient theano

为了简化问题,比如当尺寸(或特征)已经更新n次时,下次看到该特征时,我想将学习速率设置为1 / n。

我想出了这些代码:

def test_adagrad():
  embedding = theano.shared(value=np.random.randn(20,10), borrow=True)
  times = theano.shared(value=np.ones((20,1)))
  lr = T.dscalar()
  index_a = T.lvector()
  hist = times[index_a]
  cost = T.sum(theano.sparse_grad(embedding[index_a]))
  gradients = T.grad(cost, embedding)
  updates = [(embedding, embedding+lr*(1.0/hist)*gradients)]
  ### Here should be some codes to update also times which are omitted ### 
  train = theano.function(inputs=[index_a,   lr],outputs=cost,updates=updates)
  for i in range(10):
    print train([1,2,3],0.05) 

Theano没有给出任何错误,但训练结果有时会给Nan。有人知道如何解决这个问题吗?

感谢您的帮助

PS:我怀疑是稀疏空间中的操作会产生问题。所以我试图用theano.sparse.mul替换*。这给出了我之前提到的一些结果

3 个答案:

答案 0 :(得分:8)

也许您可以使用以下example for implementation of adadelta,并使用它来推导自己的{{3}}。如果成功请更新: - )

答案 1 :(得分:1)

我一直在寻找相同的东西,并最终以zuuz已经指出的资源风格自己实现它。所以这可能有助于任何寻求帮助的人。

def adagrad(lr, tparams, grads, inp, cost):
    # stores the current grads
    gshared = [theano.shared(np.zeros_like(p.get_value(),
                                           dtype=theano.config.floatX),
                             name='%s_grad' % k)
               for k, p in tparams.iteritems()]
    grads_updates = zip(gshared, grads)
    # stores the sum of all grads squared
    hist_gshared = [theano.shared(np.zeros_like(p.get_value(),
                                                dtype=theano.config.floatX),
                                  name='%s_grad' % k)
                    for k, p in tparams.iteritems()]
    rgrads_updates = [(rg, rg + T.sqr(g)) for rg, g in zip(hist_gshared, grads)]

    # calculate cost and store grads
    f_grad_shared = theano.function(inp, cost,
                                    updates=grads_updates + rgrads_updates,
                                    on_unused_input='ignore')

    # apply actual update with the initial learning rate lr
    n = 1e-6
    updates = [(p, p - (lr/(T.sqrt(rg) + n))*g)
               for p, g, rg in zip(tparams.values(), gshared, hist_gshared)]

    f_update = theano.function([lr], [], updates=updates, on_unused_input='ignore')

    return f_grad_shared, f_update

答案 2 :(得分:1)

我发现this implementation from Lasagne非常简洁易读。你几乎可以使用它:

for param, grad in zip(params, grads):
    value = param.get_value(borrow=True)
    accu = theano.shared(np.zeros(value.shape, dtype=value.dtype),
                         broadcastable=param.broadcastable)
    accu_new = accu + grad ** 2
    updates[accu] = accu_new
    updates[param] = param - (learning_rate * grad /
                              T.sqrt(accu_new + epsilon))