Question

为了简化问题，比如当尺寸（或特征）已经更新n次时，下次看到该特征时，我想将学习速率设置为1 / n。

我想出了这些代码：

def test_adagrad():
  embedding = theano.shared(value=np.random.randn(20,10), borrow=True)
  times = theano.shared(value=np.ones((20,1)))
  lr = T.dscalar()
  index_a = T.lvector()
  hist = times[index_a]
  cost = T.sum(theano.sparse_grad(embedding[index_a]))
  gradients = T.grad(cost, embedding)
  updates = [(embedding, embedding+lr*(1.0/hist)*gradients)]
  ### Here should be some codes to update also times which are omitted ### 
  train = theano.function(inputs=[index_a,   lr],outputs=cost,updates=updates)
  for i in range(10):
    print train([1,2,3],0.05)

Theano没有给出任何错误，但训练结果有时会给Nan。有人知道如何解决这个问题吗？

感谢您的帮助

PS：我怀疑是稀疏空间中的操作会产生问题。所以我试图用theano.sparse.mul替换*。这给出了我之前提到的一些结果

Answer 1

也许您可以使用以下example for implementation of adadelta，并使用它来推导自己的{{3}}。如果成功请更新： - ）

Answer 2

我一直在寻找相同的东西，并最终以zuuz已经指出的资源风格自己实现它。所以这可能有助于任何寻求帮助的人。

def adagrad(lr, tparams, grads, inp, cost):
    # stores the current grads
    gshared = [theano.shared(np.zeros_like(p.get_value(),
                                           dtype=theano.config.floatX),
                             name='%s_grad' % k)
               for k, p in tparams.iteritems()]
    grads_updates = zip(gshared, grads)
    # stores the sum of all grads squared
    hist_gshared = [theano.shared(np.zeros_like(p.get_value(),
                                                dtype=theano.config.floatX),
                                  name='%s_grad' % k)
                    for k, p in tparams.iteritems()]
    rgrads_updates = [(rg, rg + T.sqr(g)) for rg, g in zip(hist_gshared, grads)]

    # calculate cost and store grads
    f_grad_shared = theano.function(inp, cost,
                                    updates=grads_updates + rgrads_updates,
                                    on_unused_input='ignore')

    # apply actual update with the initial learning rate lr
    n = 1e-6
    updates = [(p, p - (lr/(T.sqrt(rg) + n))*g)
               for p, g, rg in zip(tparams.values(), gshared, hist_gshared)]

    f_update = theano.function([lr], [], updates=updates, on_unused_input='ignore')

    return f_grad_shared, f_update

Answer 3

我发现this implementation from Lasagne非常简洁易读。你几乎可以使用它：

for param, grad in zip(params, grads):
    value = param.get_value(borrow=True)
    accu = theano.shared(np.zeros(value.shape, dtype=value.dtype),
                         broadcastable=param.broadcastable)
    accu_new = accu + grad ** 2
    updates[accu] = accu_new
    updates[param] = param - (learning_rate * grad /
                              T.sqrt(accu_new + epsilon))

如何在python theano中编码adagrad

3 个答案: