Logistic回归评分:java.lang.NumberFormatException

时间:2016-06-05 21:03:01

标签: apache-spark apache-spark-mllib

我正在使用Spark 1.5,我想使用我在训练阶段保存的逻辑回归模型来评估新数据集。

以下是libsvm文件格式的示例数据:

1132106-2011-05-10 52:1 64:1 207:1 232:1 353:1 597:1

第一列是userid-purchase-date,后跟index:val

val unseendata: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc,unseendatafileName,  800) 

这是我的评分代码:

    var resulting_scores = unseendata.map {  case LabeledPoint(label, features) =>
      var prediction = lrm.predict(features)
        (label.toString, prediction)
    }.toDF("id_date", "score")

我收到此错误: java.lang.NumberFormatException:对于输入字符串:" 1132106-2011-05-10"

如何修复LabeledPoint中第一列的格式?我希望字符串按原样包含。

以下是详细的错误消息:

  

at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)       at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)       在java.lang.Double.parseDouble(Double.java:538)       在scala.collection.immutable.StringLike $ class.toDouble(StringLike.scala:232)       在scala.collection.immutable.StringOps.toDouble(StringOps.scala:31)       在org.apache.spark.mllib.util.MLUtils $$ anonfun $ 4.apply(MLUtils.scala:80)       在org.apache.spark.mllib.util.MLUtils $$ anonfun $ 4.apply(MLUtils.scala:78)       在scala.collection.Iterator $$ anon $ 11.next(Iterator.scala:328)       在scala.collection.Iterator $$ anon $ 11.next(Iterator.scala:328)       在scala.collection.Iterator $$ anon $ 11.next(Iterator.scala:328)       在scala.collection.Iterator $$ anon $ 11.next(Iterator.scala:328)       在scala.collection.Iterator $$ anon $ 11.next(Iterator.scala:328)       在scala.collection.Iterator $$ anon $ 11.next(Iterator.scala:328)       在org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:119)       在org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:74)       在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)       在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)       在org.apache.spark.scheduler.Task.run(Task.scala:88)       在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:214)       在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)       at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)       在java.lang.Thread.run(Thread.java:745)

0 个答案:

没有答案