org.apache.spark.SparkException:无法执行用户定义的函数($ anonfun $ 5:(string)=> double)

时间:2018-01-31 09:52:48

标签: apache-spark apache-spark-mllib sparkr

下面的R代码:

train_feature   <- loadDF("./resultFile/train_feature",
                          "csv",
                          header = TRUE)

predict_feature <- loadDF("./resultFile/predict_feature", 
                          "csv",
                          header = TRUE)

model <- spark.randomForest( 
    train_feature,
    forward_count ~ max_forward + sum_forward,
    type          = "regression", 
    maxDepth      = 16,
    maxBins       = 32,
    maxMemoryInMB = 512
)

predict_result <- predict(model, predict_feature)

prediction     <- select(predict_result, "prediction")

prediction$prediction <- cast(prediction$prediction, "integer")

head(prediction, 200)

错误是:

  

handleErrors(returnStatus,conn)出错:
  org.apache.spark.SparkException:作业因阶段失败而中止:   阶段45.0中的任务0失败1次,最近失败:丢失任务   阶段45.0中的0.0(TID 872,localhost,执行程序驱动程序):org.apache.spark.SparkException:无法执行用户定义   功能($ anonfun $ 5:(string)=&gt; double )at   org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIterator.processNext(未知   来源)at   org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)     在   org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun $ 8 $$不久$ 1.hasNext(WholeStageCodegenExec.scala:395)     在   org.apache.spark.sql.execution.SparkPlan $$ anonfun $ 2.适用(SparkPlan.scala:234)     在   org.apache.spark.sql.execution.SparkPlan $$ anonfun $ 2.适用(SparkPlan.scala:228)     在   org.apache.spark.rdd.RDD $$ anonfun $ mapPartitionsInternal $ 1 $$ anonfun $ $申请25.apply(RDD.scala:827)     在   org.apache.spark.rdd.RDD $$ anonfun $ $ mapPartitionsInternal 1 $$ anonfun $ $申请25.apply(RDD.scala:827)

然后我转换数据类型:

train_feature$sum_forward   <- cast(train_feature$sum_forward,   'double')
train_feature$max_forward   <- cast(train_feature$max_forward,   'double')
train_feature$forward_count <- cast(train_feature$forward_count, 'double')

它仍然无法处理下面的其他错误:

  

handleErrors(returnStatus,conn)出错:
  java.lang.IllegalArgumentException:数据类型StringType不是   支持的即可。在   org.apache.spark.ml.feature.VectorAssembler $$ anonfun $ transformSchema $ 1.适用(VectorAssembler.scala:121)     在   org.apache.spark.ml.feature.VectorAssembler $$ anonfun $ transformSchema $ 1.适用(VectorAssembler.scala:117)     在   scala.collection.IndexedSeqOptimized $ class.foreach(IndexedSeqOptimized.scala:33)     在   scala.collection.mutable.ArrayOps $ ofRef.foreach(ArrayOps.scala:186)     在   org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:117)     在   org.apache.spark.ml.PipelineModel $$ anonfun $ transformSchema $ 5.apply(Pipeline.scala:310)     在   org.apache.spark.ml.PipelineModel $$ anonfun $ transformSchema $ 5.apply(Pipeline.scala:310)     在   scala.collection.IndexedSeqOptimized $ class.foldl(IndexedSeqOptimized.scala:57)     在   scala.collection.IndexedSeqOptimized $ class.foldLeft(IndexedSeqOptimized.scala:66)     在   scala.collection.mutable.ArrayOps $ ofRef.foldLeft(ArrayOps.scala:186)     在org.apache.s

但是当我将公式改为 forward_count~X1 + X2 而没有均匀地转换数据类型时,它正常工作.ALL功能是由SparkMLlib的函数计算的。

这是火车数据的一部分: train_feature

0 个答案:

没有答案