应用错误收集

Implement pre-trained word embeddings in sentence level?

时间：2017-06-12 16:52:11

标签： word2vec word-embedding

I am trying to do a text classification, and using pre-trained Glove word embedding in sentence level. I am currently using very naive approach which is averaging words vectors to represent sentence.

The question is what if there is no pre-trained word appeared in the sentence, how should I do if this happens? Just ignore this sentence or randomly assign some values to this sentence vector? I can not find a reference that deal with this problem, most of paper just said they used averaging pre-trained word embeddings to generate sentence embedding.

1 个答案:

答案 0 :(得分：0)

如果一个句子没有关于你知道什么的单词，任何分类尝试都是随机猜测。

这样的无信息句子不可能改进你的分类器，因此最好不要使用完全随机的功能。

（对于具有子词语素的语言，有一些词嵌入技术可以为先前未知的词猜猜比随机词更好的词。请参阅Facebook＆＃39; FastText＆＃39;工具例如。但是，除非您的大量文本由未知单词支配，否则您可以推迟对此类技术的调查，直到验证您的一般方法是否适用于更简单的文本。）