为什么LogisticRegressionModel在对libsvm数据进行评分时失败?

时间:2016-04-28 17:38:00

标签: apache-spark apache-spark-mllib apache-spark-ml

  

加载您想要分数的数据。数据以libsvm格式存储   按以下方式:label index1:value1 index2:value2 ...(   索引是一个基础并按升序排列)这是样本数据
  100 10:1 11:1 208:1 400:1 1830:1

 val unseendata: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc,unseendatafileName)
    val scores_path = results_base + run_id + "/"  + "-scores"
// Load the saved model
    val lrm = LogisticRegressionModel.load(sc,"logisticregressionmodels/mymodel")

    // I had saved the model after the training using save method. Here is the metadate for that model LogisticRegressionModel/mymodel/metadata/part-00000
{"class":"org.apache.spark.mllib.classification.LogisticRegressionModel","version":"1.0","numFeatures":176894,"numClasses":2}

      // Evaluate model on unseen data
       var valuesAndPreds = unseendata.map { point =>
       var prediction = lrm.predict(point.features)
        (point.label, prediction)
    }

// Store the scores
    valuesAndPreds.saveAsTextFile(scores_path)

以下是我收到的错误消息:

  

16/04/28 10:22:07 WARN TaskSetManager:阶段3.0中丢失的任务0.0(TID   5,):java.lang.IllegalArgumentException:   要求在scala.Predef $ .require(Predef.scala:221)处失败   org.apache.spark.mllib.classification.LogisticRegressionModel.predictPoint(LogisticRegression.scala:105)     在   org.apache.spark.mllib.regression.GeneralizedLinearModel.predict(GeneralizedLinearAlgorithm.scala:76)

1 个答案:

答案 0 :(得分:1)

抛出异常的代码是require(dataMatrix.size == numFeatures)

我的猜测是该模型适合176894个功能(请参阅模型输出中的"numFeatures":176894),而libsvm文件只有1830特征。数字必须匹配。

将加载libsvm的行更改为:

val unseendata = MLUtils.loadLibSVMFile(sc, unseendatafileName, 176894)