TF_IDF计算发送错误

时间:2017-11-14 15:36:52

标签: python python-3.x python-2.7 tf-idf

我目前正在开展一个小型项目并且已经完成了空白,我有以下代码来计算术语频率:     来自Bag import *

words = 
['the','new','the','shiny','new','car','went','through','the','tunnel']
carDoc = Bag()
for word in words:
    carDoc.add(word)

def tf(word, carDoc):
    if word != "" and carDoc.size() > 0:
        return carDoc.count(word)/carDoc.size()

我还有以下反文档频率代码:

from Bag import *
from math import log

carDoc1 = Bag()
for word in ['the', 'car']:
    carDoc1.add(word)

carDoc2 = Bag()
for word in ['the', 'shiny', 'new']:
    carDoc2.add(word)

allCarDocs = [carDoc1, carDoc2]

def idf(word, carDocs):
    total = len(allCarDocs)
    wordIsIn = 0
    for docs in allCarDocs:
        if docs.contains(word):
            wordIsIn = wordIsIn + 1
    return log(total / (1 + wordIsIn))

carDoc1 = Bag()
for word in ['the', 'car']:
    carDoc1.add(word)
carDoc2 = Bag()
for word in ['the', 'shiny', 'new']:
    carDoc2.add(word)

allCarDocs = [carDoc1, carDoc2]

def tf_idf(word, documents):
    return tf(word, carDoc) * idf (word, allCarDocs)

我得到的错误是carDoc未定义

这些都很好,并且按照我的意图工作,但是当实现tfidf功能时,我一直都会遇到错误。任何有关解决此示例的tfidf的帮助都将受到赞赏

1 个答案:

答案 0 :(得分:0)

def tf_idf(word,documents):     return tf(word,carDoc)* idf(word,allCarDocs)

如果你的函数采用(word,文档),你想在哪里获得carDoc和allCarDoc?