我用pyspark的ML库创建了一个LDA模型。我正在完成review topics的最后一步。在将Scala语法转换为python时,我需要一些帮助
标量代码
val topicIndices = ldaModel.describeTopics(maxTermsPerTopic = 5)
val vocabList = vectorizer.vocabulary
我的python等效语法
topics=ldamodel.describeTopics(5)
vocablist=cv_tmp_model.vocabulary
我需要帮助将以下scala转换为python
val topics = topicIndices.map { case (terms, termWeights) =>
terms.map(vocabList(_)).zip(termWeights)
}
println(s"$numTopics topics:")
topics.zipWithIndex.foreach { case (topic, i) =>
println(s"TOPIC $i")
topic.foreach { case (term, weight) => println(s"$term\t$weight") }
println(s"==========")
}
我从上面的python代码生成的数据
+-----+--------------------+-------------------------------------------------------------------------------------------------------------------+
|topic|termIndices |termWeights |
+-----+--------------------+-------------------------------------------------------------------------------------------------------------------+
|0 |[4, 12, 1, 590, 852]|[0.0028631659100828368, 0.0012554108491237852, 0.0011644723252479093, 0.0011327750159178295, 0.0010764396870585554]|
|1 |[0, 1, 613, 3, 10] |[0.002252579749817043, 0.001250966869617955, 0.0011384445968439065, 0.0010844339010670746, 0.0010755506920364175] |
|2 |[0, 13, 22, 19, 16] |[0.001434960731297773, 9.736198742891527E-4, 9.12508054803329E-4, 9.011478948492135E-4, 8.853188856650885E-4] |
+-----+--------------------+-------------------------------------------------------------------------------------------------------------------+
only showing top 3 rows
['one',
'peopl',
'govern',
'think',
'econom',
'rate',
'tax',
'polici',
'year',
'like',
'make',
'demand',
'critic',
'bad',
'fall',
'probabl',
'help',
'larg',
'libertarian',
'agre',
'littl',
'suppli']