我正在使用 pyspark (版本 2.3.1 ),并且尝试使用以下代码重现相同的结果:
lda = LDA(k=10, seed=5, optimizer="em", featuresCol="features")
ldamodel = lda.fit(rescaledData)
ldatopics = ldamodel.describeTopics()
ldatopics.show(10)
输出1:
+-----+--------------------+--------------------+
|topic| termIndices| termWeights|
+-----+--------------------+--------------------+
| 0|[0, 199, 2, 35, 1...|[0.02179604286102...|
| 1|[267, 142, 76, 50...|[0.01640698273265...|
| 2|[14, 6, 12, 29, 7...|[0.01542644578135...|
| 3|[279, 193, 21, 74...|[0.01304181652577...|
| 4|[12, 70, 252, 151...|[0.01104580800704...|
| 5|[9, 75, 474, 255,...|[0.01606660426132...|
| 6|[13, 4, 88, 3, 27...|[0.02825736583107...|
| 7|[42, 146, 26, 700...|[0.01156411695149...|
| 8|[89, 2, 82, 403, ...|[0.01666772169015...|
| 9|[1, 303, 411, 83,...|[0.02547416776649...|
+-----+--------------------+--------------------+
即使我使用了种子,每次重新启动应用程序(关闭并重新打开笔记本电脑)时,我也会得到不同的结果。看第二个输出:
+-----+--------------------+--------------------+
|topic| termIndices| termWeights|
+-----+--------------------+--------------------+
| 0|[403, 199, 414, 1...|[0.01236421045802...|
| 1|[75, 109, 251, 5,...|[0.01551907510059...|
| 2|[12, 188, 6, 314,...|[0.01206780033644...|
| 3|[91, 76, 23, 82, ...|[0.01244511461388...|
| 4|[162, 127, 12, 14...|[0.01380643020451...|
| 5|[4, 46, 7, 220, 2...|[0.01591219626409...|
| 6|[89, 71, 272, 279...|[0.02027028435250...|
| 7|[1, 3, 13, 57, 27...|[0.02192425215634...|
| 8|[2, 0, 35, 87, 65...|[0.02033711369900...|
| 9|[194, 15, 37, 42,...|[0.01436615776405...|
+-----+--------------------+--------------------+
请注意,在 .transform 阶段(即使使用种子),我也遇到了同样的问题。使用的代码如下:
paramMap = {ldamodel.seed: 5}
ldaResults = ldamodel.transform(rescaledData, params=paramMap)
您有什么帮助我的提示吗?
非常感谢, 洛伦佐