将scala代码转换为python LDA

时间:2018-06-30 19:46:02

标签: scala pyspark lda databricks

我用pyspark的ML库创建了一个LDA模型。我正在完成review topics的最后一步。在将Scala语法转换为python时,我需要一些帮助

标量代码

val topicIndices = ldaModel.describeTopics(maxTermsPerTopic = 5)
val vocabList = vectorizer.vocabulary

我的python等效语法

topics=ldamodel.describeTopics(5)
vocablist=cv_tmp_model.vocabulary

我需要帮助将以下scala转换为python

val topics = topicIndices.map { case (terms, termWeights) =>
  terms.map(vocabList(_)).zip(termWeights)
}
println(s"$numTopics topics:")
topics.zipWithIndex.foreach { case (topic, i) =>
  println(s"TOPIC $i")
  topic.foreach { case (term, weight) => println(s"$term\t$weight") }
  println(s"==========")
}

我从上面的python代码生成的数据

+-----+--------------------+-------------------------------------------------------------------------------------------------------------------+
|topic|termIndices         |termWeights                                                                                                        |
+-----+--------------------+-------------------------------------------------------------------------------------------------------------------+
|0    |[4, 12, 1, 590, 852]|[0.0028631659100828368, 0.0012554108491237852, 0.0011644723252479093, 0.0011327750159178295, 0.0010764396870585554]|
|1    |[0, 1, 613, 3, 10]  |[0.002252579749817043, 0.001250966869617955, 0.0011384445968439065, 0.0010844339010670746, 0.0010755506920364175]  |
|2    |[0, 13, 22, 19, 16] |[0.001434960731297773, 9.736198742891527E-4, 9.12508054803329E-4, 9.011478948492135E-4, 8.853188856650885E-4]      |
+-----+--------------------+-------------------------------------------------------------------------------------------------------------------+
only showing top 3 rows




['one',
 'peopl',
 'govern',
 'think',
 'econom',
 'rate',
 'tax',
 'polici',
 'year',
 'like',
 'make',

 'demand',
 'critic',
 'bad',
 'fall',
 'probabl',
 'help',
 'larg',
 'libertarian',
 'agre',
 'littl',
 'suppli']

0 个答案:

没有答案