Databricks中的PipelineModel.load()时,如何解决“ AttributeError:模块'__main__'没有属性”

时间:2019-02-19 02:54:31

标签: python apache-spark pipeline databricks

我已经将PipelineModel保存在要在预测笔记本中加载的训练笔记本中。该模型分阶段具有我自己的自定义类。

我试图将类放入包中并将其导入,但仍然无法解决我的问题。

在培训笔记本中,我有:

from pyspark.ml import Pipeline

pipeline_index = Pipeline(stages=[
  ArrayStringIndex(inputCol="tokens", outputCol="tokens_idx"),
  ArrayStringIndex(inputCol="sentiment", outputCol="label_idx")
])

transformer_pipeline_index = pipeline_index.fit(df_pre)

# Save Pipeline Model
transformer_pipeline_index.write().overwrite().save("/tmp/pipelinemodel")

df_training = transformer_pipeline_index.transform(df_pre)
df_training = df_training.select(df_training.id, df_training.tokens_idx, df_training.label_idx)
df_training.show()

在预测笔记本中,我有:

from pyspark.ml import Pipeline, PipelineModel
from myclass.pipeline.arraystring import ArrayStringIndex, ArrayStringIndexModel

# Load Pipeline Model
transformer_pipeline_index = PipelineModel.load("/tmp/pipelinemodel")

df_predict = transformer_pipeline_index.transform(df_pre)
df_predict = df_predict.select(df_predict.id, df_predict.tokens_idx, df_predict.label_idx)
df_predict.show()

我希望PipelineModel.load()可以正常工作并加载之前创建的PipelineModel,但是我遇到了AttributeError:模块' main '没有属性'ArrayStringIndexModel'

0 个答案:

没有答案