将单词映射到vector sparkml CountVectorizerModel

时间:2017-12-17 20:57:40

标签: apache-spark apache-spark-mllib

在spark ml中使用CountVectorizerModel并获得数据的td-idf。

df的输出列如下所示:

(63709,[0,1,2,3,6,7,8,10,11,13],[0.6095235999680518,0.9946971867717818,0.5151611294911758,0.4371112749198506,3.4968901993588046,0.06806241719930584,1.1156025996012633,3.0425756717399217,0.3760235829400124])

想要获得与此排名映射的前n个单词。

0 个答案:

没有答案