我已经将PipelineModel保存在要在预测笔记本中加载的训练笔记本中。该模型分阶段具有我自己的自定义类。
我试图将类放入包中并将其导入,但仍然无法解决我的问题。
在培训笔记本中,我有:
from pyspark.ml import Pipeline
pipeline_index = Pipeline(stages=[
ArrayStringIndex(inputCol="tokens", outputCol="tokens_idx"),
ArrayStringIndex(inputCol="sentiment", outputCol="label_idx")
])
transformer_pipeline_index = pipeline_index.fit(df_pre)
# Save Pipeline Model
transformer_pipeline_index.write().overwrite().save("/tmp/pipelinemodel")
df_training = transformer_pipeline_index.transform(df_pre)
df_training = df_training.select(df_training.id, df_training.tokens_idx, df_training.label_idx)
df_training.show()
在预测笔记本中,我有:
from pyspark.ml import Pipeline, PipelineModel
from myclass.pipeline.arraystring import ArrayStringIndex, ArrayStringIndexModel
# Load Pipeline Model
transformer_pipeline_index = PipelineModel.load("/tmp/pipelinemodel")
df_predict = transformer_pipeline_index.transform(df_pre)
df_predict = df_predict.select(df_predict.id, df_predict.tokens_idx, df_predict.label_idx)
df_predict.show()
我希望PipelineModel.load()可以正常工作并加载之前创建的PipelineModel,但是我遇到了AttributeError:模块' main '没有属性'ArrayStringIndexModel'