如何将TFIDF值映射到原始单词

时间:2019-05-13 17:33:36

标签: scala apache-spark apache-spark-sql apache-spark-mllib tf-idf

我遵循this示例来计算文档中每个单词的TFIDF。但是,我的最终输出看起来像这样(自从我使用HashingTF以来,这显然还可以):

(262144,[24856,31066,96984,119418,143328,176968,193347,223999,243191,245270,250475],[2.3513752571634776,1.9459101490553132,1.9459101490553132,2.3513752571634776,1.4350845252893227,2.3513752571634776,2.3513752571634776,1.9459101490553132,3.8918202981106265,1.9459101490553132,2.3513752571634776])
(262144,[21028,31066,71524,72609,116873,140075,142830,155149,222394,226568,245044],[1.9459101490553132,1.9459101490553132,1.6582280766035324,2.3513752571634776,2.3513752571634776,1.9459101490553132,1.9459101490553132,2.3513752571634776,1.9459101490553132,1.252762968495368,1.9459101490553132])

请问有没有将单词匹配到其TFIDF值的API?

0 个答案:

没有答案