Question

class RNNSLU(object):
''' elman neural net model '''
def __init__(self, nh, nc, ne, de, cs):
    '''
    nh :: dimension of the hidden layer
    nc :: number of classes
    ne :: number of word embeddings in the vocabulary
    de :: dimension of the word embeddings
    cs :: word window context size
    '''
    # parameters of the model
    self.emb = theano.shared(name='embeddings',
                             value=0.2 * numpy.random.uniform(-1.0, 1.0,
                             (ne+1, de))
                             # add one for padding at the end
                             .astype(theano.config.floatX))
    self.wx = theano.shared(name='wx',
                            value=0.2 * numpy.random.uniform(-1.0, 1.0,
                            (de * cs, nh))
                            .astype(theano.config.floatX))
    self.wh = theano.shared(name='wh',
                            value=0.2 * numpy.random.uniform(-1.0, 1.0,
                            (nh, nh))
                            .astype(theano.config.floatX))
    self.w = theano.shared(name='w',
                           value=0.2 * numpy.random.uniform(-1.0, 1.0,
                           (nh, nc))
                           .astype(theano.config.floatX))
    self.bh = theano.shared(name='bh',
                            value=numpy.zeros(nh,
                            dtype=theano.config.floatX))
    self.b = theano.shared(name='b',
                           value=numpy.zeros(nc,
                           dtype=theano.config.floatX))
    self.h0 = theano.shared(name='h0',
                            value=numpy.zeros(nh,
                            dtype=theano.config.floatX))

    # bundle
    self.params = [self.emb, self.wx, self.wh, self.w, self.bh, self.b, self.h0]



def recurrence(x_t, h_tm1):
        h_t = T.nnet.sigmoid(T.dot(x_t, self.wx)
                             + T.dot(h_tm1, self.wh) + self.bh)
        s_t = T.nnet.softmax(T.dot(h_t, self.w) + self.b)
        return [h_t, s_t]

[h, s], = theano.scan(fn=recurrence,
                            sequences=x,
                            outputs_info=[self.h0, None],
                            n_steps=x.shape[0])

我正在关注有关RNN的Theano教程。（http://deeplearning.net/tutorial/rnnslu.html）但我有两个问题。第一。在本教程中，重现函数如下：

def recurrence(x_t, h_tm1): h_t = T.nnet.sigmoid(T.dot(x_t, self.wx) + T.dot(h_tm1, self.wh) + self.bh) s_t = T.nnet.softmax(T.dot(h_t, self.w) + self.b) return [h_t, s_t]

我为什么不加h_t中的h0？（即h_t = T.nnet.sigmoid(T.dot(x_t, self.wx) + T.dot(h_tm1, self.wh) + self.bh + self.h0)）

第二，为什么outputs_info=[self.h0, None]？我知道outputs_info是初始化结果。所以我想outputs_info=[self.bh+self.h0, T.nnet.softmax(T.dot(self.bh+self.h0, self.w_h2y) + self.b_h2y)]

Answer 1

def recurrence(x_t, h_tm1):
        h_t = T.nnet.sigmoid(T.dot(x_t, self.wx)
                             + T.dot(h_tm1, self.wh) + self.bh)
        s_t = T.nnet.softmax(T.dot(h_t, self.w) + self.b)
        return [h_t, s_t]

所以，首先你问我们为什么不在递归函数中使用h0。让我们分解这部分，

   h_t = T.nnet.sigmoid(T.dot(x_t, self.wx)+ T.dot(h_tm1, self.wh) + self.bh)

我们期望的是3个学期。

第一个词是输入层乘以加权矩阵T.dot(x_t, self.wx)。
第二个术语是隐藏层，由另一个加权矩阵（这就是使其重复出现）T.dot(h_tm1, self.wh)多了。请注意，您必须有加权矩阵，建议您基本上添加self.h0作为偏见。
第三个术语是隐藏图层的偏差self.bh。

现在，在每次迭代之后，我们想要跟踪self.h0中包含的隐藏层激活。但是，self.h0意味着包含CURRENT激活，我们需要的是之前的激活。

[h, s], _ = theano.scan(fn=recurrence,
                            sequences=x,
                            outputs_info=[self.h0, None],
                            n_steps=x.shape[0])

所以，再看一下扫描功能。你是对的outputs_info=[self.h0, None]初始化值，但值也与输出相关联。 recurrence()有两个输出，即[h_t, s_t]。

那么outputs_info的作用也就是每次迭代后，self.h0的值被h_t（第一个返回值）覆盖。 outputs_info的第二个元素是None，因为我们不会在任何地方保存或初始化s_t的值（outputs_info的第二个参数以这种方式链接到递归函数的返回值。）

在下一次迭代中，outputs_info的第一个参数再次用作输入，因此h_tm1与self.h0的值相同。但是，既然我们必须有h_tm的参数，我们必须初始化这个值。由于我们不需要在outputs_info中初始化第二个参数，因此我们将第二个术语保留为None。

当然，theano.scan()功能有时令人困惑，我也很陌生。但是，这是我在做同样的教程时所理解的。

Theano教程中RNN的参数

1 个答案: