Question

我要进行分类，输入的是4维张量。我没有将单词向量输入LSTM单元进行分类，而是尝试输入整个句子，以便它可以学习句子之间的联系。输入示例：

input = ['this is first sentence',
         'this is second sentence']

众所周知，文本通常会转换为单词向量。单词向量的示例如下（假设嵌入大小为2）：

word_embedding = [
    [ [0.21, 0.43], [0.55, 0.87], [0.73, 0.51], [0.64, 0.98] ],
    [ [0.21, 0.43], [0.55, 0.87], [0.12, 0.29], [0.64, 0.98] ]
]

现在，像上面那样用词向量表示的句子集合是一个3维张量，但是如果我有多个句子集合，它将是4维。

为了能够提供整个句子，我必须将维度减少到3维，这意味着一个句子应该表示为单个向量，而不是向量的集合。首先，我是通过平均句子中的单词向量来实现的。示例（使用上面的word_embedding）：

mean_word_embedding = [
    [0.5325, 0.6975],
    [0.38  , 0.6425]
]

但是这种方法的准确性很差。我想尝试的另一种方法是通过LSTM输入句子中的每个单词向量，并使用最后的输出作为句子的表示向量。我只是找不到办法。这是我的代码的摘要（某些函数的代码不相关，因此未显示）：

# Create a 3-dimension tensor, with
# the 1st dimension being the number of samples,
# the 2nd being the number of sentences in a sample, and
# the 3rd being the number of words in a sample
input = tf.placeholder(tf.int32, shape=[None, sentences_length, words_length])

# Convert each word into a vector with size embedding_size.
# A this point, the input tensor has been turned into a 4-dimension tensor,
# with the 4th dimension being the embedding size
# (based on the examples above, embedding size is 2)
word_embedding = convert_word_embedding(input, embedding_size)

# Reduce the 4-dimension tensor into 3-dimension tensor
# so each sentence is only represented by a single vector
mean_word_embedding = tf.reduce_mean(word_embedding, axis=1)  # This is the average approach
sentence_embedding = recurrent_words(word_embedding, sentences_length, words_length, embedding_size)  # This is the alternate approach

我在 recurrent_words（）中做什么？

任何帮助将不胜感激。谢谢。

Tensorflow-使用RNN减少张量尺寸

0 个答案: