我正在尝试加载使用Pyspark创建的模型。我使用以下代码创建了模型:
import pandas as pd
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.ml.tuning import TrainValidationSplit, ParamGridBuilder
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext('local')
spark = SparkSession(sc)
data = pd.read_csv('matrix-out-small.csv')
df = spark.createDataFrame(data)
(training, test) = df.randomSplit([0.8, 0.2])
als = ALS(userCol="CustomerID", itemCol="ProductID", ratingCol="Rating", coldStartStrategy="drop", nonnegative=True)
# Tune model using param grid builder
param_grid = ParamGridBuilder().addGrid(als.rank, [12, 13, 14]).addGrid(als.maxIter, [18, 19, 20]).addGrid(als.regParam, [.17, .18, .19]).build()
evaluator = RegressionEvaluator(metricName="rmse", labelCol="Rating", predictionCol="prediction")
tvs = TrainValidationSplit(estimator=als, estimatorParamMaps=param_grid, evaluator=evaluator)
# fit model to training data
model = tvs.fit(training)
# extract best
best_model = model.bestModel
best_model.save("modelSaveOut")
这将创建一个名为“ ModelSaveOut”的目录,其中包含“ ItemFactors”,“元数据”和“ userFactors”
当我尝试使用ALS.load加载模型时,得到以下信息:
model = ALS.load("modelSaveOut")
py4j.protocol.Py4JJavaError:调用o26.load时发生错误。 :java.lang.NoSuchMethodException: org.apache.spark.ml.recommendation.ALSModel。(java.lang.String)
model = TrainValidationSplit.load("modelSaveOut")
py4j.protocol.Py4JJavaError:调用o26.load时发生错误。 :java.lang.IllegalArgumentException:要求失败:错误 加载元数据:期望的类名 org.apache.spark.ml.tuning.TrainValidationSplit但找到了类名 org.apache.spark.ml.recommendation.ALSModel
似乎我没有使用正确的对象/方法加载模型。是否可以保存“ bestModel”,还是我需要使用其他方法保存整个模型?
答案 0 :(得分:1)
如果您阅读了异常跟踪
但找到了类名org.apache.spark.ml.recommendation.ALSModel
它将告诉您确切的操作:
from pyspark.ml.recommendation import ALS, ALSModel
ALSModel.load("modelSaveOut")