Question

关于在pytorch中使用循环网络生成序列的最佳实践，我有几个问题。

第一个，如果我想建立解码器网，我应该使用nn.GRU（或nn.LSTM）而不是nn.LSTMCell（nn.GRUCell）？根据我的经验，如果我使用LSTMCell，计算速度会比使用nn.LSTM时更低（最多100次）。也许它与LSTM（和GRU）模块的cudnn优化有关？有没有办法加快LSTMCell计算？

我尝试构建一个自动编码器，它接受可变长度的序列。我的自动编码器看起来像：

class SimpleAutoencoder(nn.Module):
    def init(self, input_size, hidden_size, n_layers=3):
    super(SimpleAutoencoder, self).init()
    self.n_layers = n_layers
    self.hidden_size = hidden_size
    self.gru_encoder = nn.GRU(input_size, hidden_size,n_layers,batch_first=True)
    self.gru_decoder = nn.GRU(input_size, hidden_size, n_layers, batch_first=True)
    self.h2o = nn.Linear(hidden_size,input_size) # Hidden to output

def encode(self, input):
    output, hidden = self.gru_encoder(input, None)
    return output, hidden

def decode(self, input, hidden):
    output,hidden = self.gru_decoder(input,hidden)
    return output,hidden
def h2o_apply(self,input):
    return self.h2o(input)

我的训练循环如下：

one_hot_batch = list(map(lambda x:Variable(torch.FloatTensor(x)),one_hot_batch))

packed_one_hot_batch = pack_padded_sequence(pad_sequence(one_hot_batch,batch_first=True).cuda(),batch_lens, batch_first=True)

 _, latent = vae.encode(packed_one_hot_batch)
 outputs, = vae.decode(packed_one_hot_batch,latent)
 packed = pad_packed_sequence(outputs,batch_first=True)

 for string,length,index in zip(*packed,range(batch_size)):
        decoded_string_without_sos_symbol = vae.h2o_apply(string[1:length])
        loss += criterion(decoded_string_without_sos_symbol,real_strings_batch[index][1:])
 loss /= len(batch)

据我所知，这种方式的培训是教师的力量。因为在解码阶段，网络提供实际输入（outputs,_ = vae.decode(packed_one_hot_batch,latent)）。但是，对于我的任务，它导致的情况是，在测试阶段，只有当我使用真实符号时（例如在训练模式中），网络才能很好地生成序列，但如果我提供上一步的输出，则网络产生垃圾（只是无限重复一个特定的符号）。

我尝试了另一种方法。我生成了“假”输入（只是一些），以使模型仅从隐藏状态生成。

one_hot_batch_fake = list(map(lambda x:torch.ones_like(x).cuda(),one_hot_batch))
packed_one_hot_batch_fake = pack_padded_sequence(pad_sequence(one_hot_batch_fake, batch_first=True).cuda(), batch_lens, batch_first=True)

_, latent = vae.encode(packed_one_hot_batch)
outputs, = vae.decode(packed_one_hot_batch_fake,latent)
packed = pad_packed_sequence(outputs,batch_first=True)

它有效，但效率非常低，重建质量非常低。那么第二个问题，从潜在表示生成序列的正确方法是什么？

我想，好主意是以一定的概率应用教师强制，但为此，如何使用nn.GRU层，以便上一步的输出应该是下一步的输入？

从潜在空间生成序列[pytorch]

0 个答案: