建立词汇并生成向量

时间:2019-01-09 20:43:01

标签: python nltk

我创建文档词汇表并从句子中提取单词。 我希望将每个句子与生成的单词列表进行比较。 但是它显示了这些错误:

def generate_bow(messages):    
   vocab = tokenize(messages)
   print("Word List for Document \n{0} \n".format(vocab));
for sentence in messages:
       words = word_extraction(sentence)
       bag_vector = numpy.zeros(len(vocab))
       for w in words:
           for i,word in enumerate(vocab):
               if word == w: 
                   bag_vector[i] += 1

       print("{0}\n{1}\n".format(sentence,numpy.array(bag_vector)))

NameError                               
Traceback (most recent call last)
<ipython-input-37-34430e8c4ee8> in <module>()
       4 for sentence in messages:
       5         words = word_extraction(sentence)
 ----> 6         bag_vector = numpy.zeros(len(vocab))
       7         for w in words:
       8             for i,word in enumerate(vocab):

 NameError: name 'vocab' is not defined

我已经导入了“ Numpy”,还尝试添加dtype=float,仍然存在相同的问题。

0 个答案:

没有答案