加载您想要分数的数据。数据以libsvm格式存储 按以下方式:label index1:value1 index2:value2 ...( 索引是一个基础并按升序排列)这是样本数据
100 10:1 11:1 208:1 400:1 1830:1
val unseendata: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc,unseendatafileName)
val scores_path = results_base + run_id + "/" + "-scores"
// Load the saved model
val lrm = LogisticRegressionModel.load(sc,"logisticregressionmodels/mymodel")
// I had saved the model after the training using save method. Here is the metadate for that model LogisticRegressionModel/mymodel/metadata/part-00000
{"class":"org.apache.spark.mllib.classification.LogisticRegressionModel","version":"1.0","numFeatures":176894,"numClasses":2}
// Evaluate model on unseen data
var valuesAndPreds = unseendata.map { point =>
var prediction = lrm.predict(point.features)
(point.label, prediction)
}
// Store the scores
valuesAndPreds.saveAsTextFile(scores_path)
以下是我收到的错误消息:
16/04/28 10:22:07 WARN TaskSetManager:阶段3.0中丢失的任务0.0(TID 5,):java.lang.IllegalArgumentException: 要求在scala.Predef $ .require(Predef.scala:221)处失败 org.apache.spark.mllib.classification.LogisticRegressionModel.predictPoint(LogisticRegression.scala:105) 在 org.apache.spark.mllib.regression.GeneralizedLinearModel.predict(GeneralizedLinearAlgorithm.scala:76)
答案 0 :(得分:1)
抛出异常的代码是require(dataMatrix.size == numFeatures)
。
我的猜测是该模型适合176894
个功能(请参阅模型输出中的"numFeatures":176894
),而libsvm文件只有1830
特征。数字必须匹配。
将加载libsvm的行更改为:
val unseendata = MLUtils.loadLibSVMFile(sc, unseendatafileName, 176894)