如何解决用户定义函数的SparkException?

时间:2019-09-16 08:31:40

标签: scala apache-spark linear-regression

我想对数据集应用线性回归

val featureCols = Array("molecule_id", "group_id", "atom_id", "atom_id2", "mweight")
val assembler = new VectorAssembler()
    .setInputCols(featureCols).setOutputCol("features")
val df2 = assembler.transform(df)

val labelIndexer = new StringIndexer().setInputCol("logp").setOutputCol("label")
val df3 = labelIndexer.fit(df2).transform(df2)

val Array(trainingData, testData)= df3.randomSplit(Array(0.8, 0.2))

val linearRegression = new LinearRegression()
    .setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8)
val linearRegressionModel = linearRegression.fit(trainingData)
  

19/09/16 13:09:54错误执行器:阶段29.0中的任务0.0中发生异常   (TID 29)org.apache.spark.SparkException:无法执行用户   定义的函数($ anonfun $ 9:(string)=> double)在   org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIteratorForCodegenStage5.processNext(未知   来源)   org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)     在   org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun $ 10 $$ anon $ 1.hasNext(WholeStageCodegenExec.scala:614)     在scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:409)在   scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:409)在   scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:409)在   scala.collection.Iterator $ class.foreach(Iterator.scala:891)在   scala.collection.AbstractIterator.foreach(Iterator.scala:1334)在   scala.collection.TraversableOnce $ class.foldLeft(TraversableOnce.scala:157)     在scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)     在   scala.collection.TraversableOnce $ class.aggregate(TraversableOnce.scala:214)     在scala.collection.AbstractIterator.aggregate(Iterator.scala:1334)     在   org.apache.spark.rdd.RDD $$ anonfun $ treeAggregate $ 1 $$ anonfun $ 23.apply(RDD.scala:1139)     在   org.apache.spark.rdd.RDD $$ anonfun $ treeAggregate $ 1 $$ anonfun $ 23.apply(RDD.scala:1139)     在   org.apache.spark.rdd.RDD $$ anonfun $ treeAggregate $ 1 $$ anonfun $ 24.apply(RDD.scala:1140)     在   org.apache.spark.rdd.RDD $$ anonfun $ treeAggregate $ 1 $$ anonfun $ 24.apply(RDD.scala:1140)     在   org.apache.spark.rdd.RDD $$ anonfun $ mapPartitions $ 1 $ anonfun $ apply $ 23.apply(RDD.scala:800)     在   org.apache.spark.rdd.RDD $$ anonfun $ mapPartitions $ 1 $ anonfun $ apply $ 23.apply(RDD.scala:800)     在   org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)     在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)     在org.apache.spark.rdd.RDD.iterator(RDD.scala:288)处   org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)在   org.apache.spark.scheduler.Task.run(Task.scala:109)在   org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:345)     在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)     在   java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)     在java.lang.Thread.run(Thread.java:748)

0 个答案:

没有答案