Question

我创建文档词汇表并从句子中提取单词。我希望将每个句子与生成的单词列表进行比较。但是它显示了这些错误：

def generate_bow(messages):    
   vocab = tokenize(messages)
   print("Word List for Document \n{0} \n".format(vocab));
for sentence in messages:
       words = word_extraction(sentence)
       bag_vector = numpy.zeros(len(vocab))
       for w in words:
           for i,word in enumerate(vocab):
               if word == w: 
                   bag_vector[i] += 1

       print("{0}\n{1}\n".format(sentence,numpy.array(bag_vector)))

NameError                               
Traceback (most recent call last)
<ipython-input-37-34430e8c4ee8> in <module>()
       4 for sentence in messages:
       5         words = word_extraction(sentence)
 ----> 6         bag_vector = numpy.zeros(len(vocab))
       7         for w in words:
       8             for i,word in enumerate(vocab):

 NameError: name 'vocab' is not defined

我已经导入了“ Numpy”，还尝试添加dtype=float，仍然存在相同的问题。

建立词汇并生成向量

0 个答案: