下面的R代码:
train_feature <- loadDF("./resultFile/train_feature",
"csv",
header = TRUE)
predict_feature <- loadDF("./resultFile/predict_feature",
"csv",
header = TRUE)
model <- spark.randomForest(
train_feature,
forward_count ~ max_forward + sum_forward,
type = "regression",
maxDepth = 16,
maxBins = 32,
maxMemoryInMB = 512
)
predict_result <- predict(model, predict_feature)
prediction <- select(predict_result, "prediction")
prediction$prediction <- cast(prediction$prediction, "integer")
head(prediction, 200)
错误是:
handleErrors(returnStatus,conn)出错:
org.apache.spark.SparkException:作业因阶段失败而中止: 阶段45.0中的任务0失败1次,最近失败:丢失任务 阶段45.0中的0.0(TID 872,localhost,执行程序驱动程序):org.apache.spark.SparkException:无法执行用户定义 功能($ anonfun $ 5:(string)=&gt; double )at org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIterator.processNext(未知 来源)at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) 在 org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun $ 8 $$不久$ 1.hasNext(WholeStageCodegenExec.scala:395) 在 org.apache.spark.sql.execution.SparkPlan $$ anonfun $ 2.适用(SparkPlan.scala:234) 在 org.apache.spark.sql.execution.SparkPlan $$ anonfun $ 2.适用(SparkPlan.scala:228) 在 org.apache.spark.rdd.RDD $$ anonfun $ mapPartitionsInternal $ 1 $$ anonfun $ $申请25.apply(RDD.scala:827) 在 org.apache.spark.rdd.RDD $$ anonfun $ $ mapPartitionsInternal 1 $$ anonfun $ $申请25.apply(RDD.scala:827)
然后我转换数据类型:
train_feature$sum_forward <- cast(train_feature$sum_forward, 'double')
train_feature$max_forward <- cast(train_feature$max_forward, 'double')
train_feature$forward_count <- cast(train_feature$forward_count, 'double')
它仍然无法处理下面的其他错误:
handleErrors(returnStatus,conn)出错:
java.lang.IllegalArgumentException:数据类型StringType不是 支持的即可。在 org.apache.spark.ml.feature.VectorAssembler $$ anonfun $ transformSchema $ 1.适用(VectorAssembler.scala:121) 在 org.apache.spark.ml.feature.VectorAssembler $$ anonfun $ transformSchema $ 1.适用(VectorAssembler.scala:117) 在 scala.collection.IndexedSeqOptimized $ class.foreach(IndexedSeqOptimized.scala:33) 在 scala.collection.mutable.ArrayOps $ ofRef.foreach(ArrayOps.scala:186) 在 org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:117) 在 org.apache.spark.ml.PipelineModel $$ anonfun $ transformSchema $ 5.apply(Pipeline.scala:310) 在 org.apache.spark.ml.PipelineModel $$ anonfun $ transformSchema $ 5.apply(Pipeline.scala:310) 在 scala.collection.IndexedSeqOptimized $ class.foldl(IndexedSeqOptimized.scala:57) 在 scala.collection.IndexedSeqOptimized $ class.foldLeft(IndexedSeqOptimized.scala:66) 在 scala.collection.mutable.ArrayOps $ ofRef.foldLeft(ArrayOps.scala:186) 在org.apache.s
但是当我将公式改为 forward_count~X1 + X2 而没有均匀地转换数据类型时,它正常工作.ALL功能是由SparkMLlib的函数计算的。
这是火车数据的一部分: train_feature