Question

加载您想要分数的数据。数据以libsvm格式存储   按以下方式：label index1：value1 index2：value2 ...（   索引是一个基础并按升序排列）这是样本数据
  100 10：1 11：1 208：1 400：1 1830：1

 val unseendata: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc,unseendatafileName)
    val scores_path = results_base + run_id + "/"  + "-scores"
// Load the saved model
    val lrm = LogisticRegressionModel.load(sc,"logisticregressionmodels/mymodel")

    // I had saved the model after the training using save method. Here is the metadate for that model LogisticRegressionModel/mymodel/metadata/part-00000
{"class":"org.apache.spark.mllib.classification.LogisticRegressionModel","version":"1.0","numFeatures":176894,"numClasses":2}

      // Evaluate model on unseen data
       var valuesAndPreds = unseendata.map { point =>
       var prediction = lrm.predict(point.features)
        (point.label, prediction)
    }

// Store the scores
    valuesAndPreds.saveAsTextFile(scores_path)

以下是我收到的错误消息：

16/04/28 10:22:07 WARN TaskSetManager：阶段3.0中丢失的任务0.0（TID 5，）：java.lang.IllegalArgumentException：要求在scala.Predef $ .require（Predef.scala：221）处失败 org.apache.spark.mllib.classification.LogisticRegressionModel.predictPoint（LogisticRegression.scala：105）在 org.apache.spark.mllib.regression.GeneralizedLinearModel.predict（GeneralizedLinearAlgorithm.scala：76）

Answer 1

抛出异常的代码是require(dataMatrix.size == numFeatures)。

我的猜测是该模型适合176894个功能（请参阅模型输出中的"numFeatures":176894），而libsvm文件只有1830特征。数字必须匹配。

将加载libsvm的行更改为：

val unseendata = MLUtils.loadLibSVMFile(sc, unseendatafileName, 176894)

为什么LogisticRegressionModel在对libsvm数据进行评分时失败？

1 个答案: