Question

我已将输入数据分为train_df，test_df和val_df。我已经使用train_df数据训练了模型，并希望保存并加载它。

我的代码：

lr = LogisticRegression(maxIter=100)
lrModel = lr.fit(train_df)

predictions = lrModel.transform(val_df)

evaluator = BinaryClassificationEvaluator(rawPredictionCol="rawPrediction")
print("Prediction : \n")
print(evaluator.evaluate(predictions))

accuracy = predictions.filter(predictions.label == predictions.prediction).count() / float(val_set.count())
print("Accuracy : \n")
print(accuracy)

lrModel.write().save("/home/vijay18/spark-2.1.0-bin-hadoop2.7/python/lrModel")
model = LogisticRegressionModel()
model.load("/home/vijay18/spark-2.1.0-bin-hadoop2.7/python/lrModel")

这是我在终端上遇到的错误。错误的前三行用于保存模型，其余部分用于加载模型。

错误：

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
18/07/17 20:04:01 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl

Answer 1

load不能在实例上调用。应该是

from pyspark.ml.classification import LogisticRegressionModel

LogisticRegressionModel.load(path)

尝试在Pyspark中保存和加载Logistic回归模型时出错

1 个答案: