Question

我想创建一个电子邮件分类器，对于每个电子邮件，必须猜测正确的类别（电子邮件的主题）。我正在使用RNN，尤其是使用嵌入，LSTM块和辍学。

我的网络结构为：

sentence_indices = Input(shape=input_shape, dtype=np.int32)
emb_dim = 300 # embedding di 300 parole in italiano
# Create the embedding layer pretrained with GloVe Vectors (≈1 line)
embedding_layer =  pretrained_embedding_layer(word_to_vec_map, word_to_index, emb_dim)

# Propagate sentence_indices through your embedding layer, you get back the 
embeddings
embeddings = embedding_layer(sentence_indices)   

# Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
# Be careful, the returned output should be a batch of sequences.
X = LSTM(512, return_sequences=True)(embeddings)
# Add dropout with a probability of 0.5
X = Dropout(0.15)(X)
# Propagate X trough another LSTM layer with 256-dimensional hidden state
# Be careful, the returned output should be a single hidden state, not a batch of sequences.
X = LSTM(256)(X)
# Add dropout with a probability of 0.5
X = Dropout(0.15)(X)
# Propagate X through a Dense layer with softmax activation to get back a 
batch of 5-dimensional vectors.
X = Dense(num_activation, activation='softmax')(X)
# Add a softmax activation
# X =  Activation('softmax')(X)

# Create Model instance which converts sentence_indices into X.
model = Model(sentence_indices, X)

sequentialModel = Sequential(model.layers)

RMS = optimizers.RMSprop(lr=0.01, rho=0.9, epsilon=None, decay=0.0)
model.compile(loss='categorical_crossentropy', optimizer=RMS, metrics=['accuracy'])

1）我使用电子邮件中的300个单词作为输入，因为我使用的是批处理，所以必须使用填充。目前，我使用的是右填充而不是左填充，但性能似乎相似。平均长度为410个字，最大为1000个字，最小为70个字。奇怪的是，当我增加输入大小时，就准确性而言，性能会变差。有一个解释吗？为什么如果我提供更多信息，我会得到更好的表现？

2）我正在尝试使用尺寸为512的LSTM层，而下一层为尺寸为256的LSTM，与以前的测试相比，我获得了较差的性能，第一个LSTM = 256和第二个LSTM = 128为什么我的这种性能下降了？这可能取决于对学习率的错误选择[我使用的是lr = 0.01]？

* edit1：我了解了选择节点数的经验法则：（#input + #output）* 2/3。此规则对每个隐藏层有效还是仅对第一个有效？甚至对于其他层也有某种规则？

3）我使用的是词嵌入，因为这样一来，每个类别我都需要更少的样本。但是，如果我针对特定类别的样本很少，则应该多次使用相同的样本，还是没有用？有没有办法弥补样本的不足？

4）我应该使用LSTM中的字段返回序列和返回状态以获得更好的性能吗？在我的LSTM层中插入此参数是否有用？

提前谢谢。

如何为用于文本分类的RNN选择隐藏层的输入大小和节点数？

0 个答案: