如何计算字符串中单词的tfidf值?

时间:2019-01-24 23:41:02

标签: python tf-idf

我的输入为:

He is a boy. She is a girl.
He is mad. She is brainsick.
He made himself the king.Queen create her life luxurious.
Their Royal livelihood is good.

我正在矢量化单词并获得如下矩阵:

     boy      girl       mad ...   livelihood      good     
0   0.000000  0.000000  0.496683 ...     0.000000  0.000000  0.0
1   0.627543  0.000000  0.000000 ...     0.000000  0.000000  0.0
2   0.000000  0.000000  0.000000 ...     0.000000  0.000000  0.0
3   0.000000  0.000000  0.543939 ...     0.254394  0.424127  0.0
4   0.000000  0.000000  0.000000 ...     0.232293  0.000000  0.0

我尝试过这个公式

TF(t)=(术语t在文档中出现的次数)/(文档中术语的总数)。

我期望该值为0.627543,但根据公式计算,当我计算在纸上时,我得到0.34 ...某物。

0 个答案:

没有答案