I am trying to do a text classification, and using pre-trained Glove word embedding in sentence level. I am currently using very naive approach which is averaging words vectors to represent sentence.
The question is what if there is no pre-trained word appeared in the sentence, how should I do if this happens? Just ignore this sentence or randomly assign some values to this sentence vector? I can not find a reference that deal with this problem, most of paper just said they used averaging pre-trained word embeddings to generate sentence embedding.
答案 0 :(得分:0)
如果一个句子没有关于你知道什么的单词,任何分类尝试都是随机猜测。
这样的无信息句子不可能改进你的分类器,因此最好不要使用完全随机的功能。
(对于具有子词语素的语言,有一些词嵌入技术可以为先前未知的词猜猜比随机词更好的词。请参阅Facebook' FastText'工具例如。但是,除非您的大量文本由未知单词支配,否则您可以推迟对此类技术的调查,直到验证您的一般方法是否适用于更简单的文本。)