Spark:将Scala ML模型加载到PySpark

时间:2017-12-02 22:46:52

标签: apache-spark pyspark apache-spark-mllib apache-spark-ml

我在scala Spark中训练了一个LDA模型。

val lda = new LDA().setK(k).setMaxIter(iter).setFeaturesCol(colnames).fit(data)

lda.save(path)

我检查了我保存的模型,它包含两个文件夹:元数据和数据。

但是,当我尝试将此模型加载到PySpark时,我收到错误消息:

model = LDAModel.load(sc, path = path) 


File "/Users/hongbowang/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-
0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling 
o33.loadLDAModel.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not 
exist:file:/Users/hongbowang/Personal/Spark%20Program/Spark%20Project/
T1/output_K20_topic/lda/metadata

有谁知道我该如何解决?非常感谢〜!

1 个答案:

答案 0 :(得分:1)

您已保存TypeError Traceback (most recent call last) <ipython-input-53-86f4db0b80e4> in <module>() ----> 1 df['Time'] = time.strftime("%d-%m", time.localtime(df['Time'])) TypeError: cannot convert the series to <class 'int'> ,但尝试使用ml.clustering.LDAModel阅读。您应该导入正确的mllib.clustering.LDAModel。对于本地模型:

LDAModel

for distributed model:

from pyspark.ml.clustering import LocalLDAModel

LocalLDAModel.load(path)