skelarn TFIDF分数与公式不符

时间:2019-04-24 15:07:01

标签: python scikit-learn tfidfvectorizer

我正在尝试了解TFIDF的计算。我用下面的代码。第一个文档中“好”的TFIDF分数是(0,0)0.5797386715376657,但是根据TFID forumla = tf * idf = tf * log(2/2)=0。我想知道为什么这里给出的TFIDF分数为0.57

from sklearn.feature_extraction.text import TfidfVectorizer
vec = TfidfVectorizer()
res = vec.fit_transform(["good movie","he is good person"])
print(vec.vocabulary_)
print(res)

{'good':0,'movie':3,'he':1,'is':2,2,'person':4}
  (0,0)0.5797386715376657
  (0,3)0.8148024746671689
  (1,0)0.37997836159100784
  (1,1)0.534046329052269
  (1,2)0.534046329052269
  (1,4)0.534046329052269

0 个答案:

没有答案