在spark ml中使用CountVectorizerModel并获得数据的td-idf。
df的输出列如下所示:
(63709,[0,1,2,3,6,7,8,10,11,13],[0.6095235999680518,0.9946971867717818,0.5151611294911758,0.4371112749198506,3.4968901993588046,0.06806241719930584,1.1156025996012633,3.0425756717399217,0.3760235829400124])
想要获得与此排名映射的前n个单词。