使用sdf_register函数的SparklyR阶段失败

时间:2018-04-12 20:23:51

标签: r sparklyr

我有一个75x239 Spark DataFrame,其中包含所有数字数据类型的列。

数据帧从R fine通过本地连接导出。但是,当我尝试为Spark DataFrame分区或创建元数据时,我收到错误。我不知道为什么我会收到这个错误。原始数据帧中没有任何NA或NULL。

    > import_medications <- copy_to(sc, ML_MEDICATIONS, "medications", 
                            overwrite = T)
    > partition_medications <- sdf_partition(import_medications, training = 
                               0.5, testing = 0.5)
    > sdf_register(partition_medications)
     [[1]]
    Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 20.0 failed 1 times, most recent failure: Lost task 0.0 in stage 20.0 (TID 20, localhost, executor driver): org.spark_project.guava.util.concurrent.ExecutionError: java.lang.StackOverflowError
    at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2261)
    at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
    at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
    at org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
    at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890)
    at org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:155)
    at org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:43)
    at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:874)
    at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:871)
    at org.apache.spark.sql.execution.SparkPlan.newOrdering(SparkPlan.scala:363)
    at org.apache.spark.sql.execution.SortExec.createSorter(SortExec.scala:63)
    at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:102)
    at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:101)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.StackOverflowError
    at java.lang.String.substring(Unknown Source)
    at org.codehaus.janino.CodeContext.determineArgumentsSize(CodeContext.java:785)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:474)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)

调用partition_medications Spark DataFrame会返回类似的错误:

> partition_medications
$training
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 21.0 failed 1 times, most recent failure: Lost task 0.0 in stage 21.0 (TID 21, localhost, executor driver): org.spark_project.guava.util.concurrent.ExecutionError: java.lang.StackOverflowError
    at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2261)
    at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
    at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
    at org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
    at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890)
    at org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:155)
    at org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:43)
    at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:874)
    at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:871)
    at org.apache.spark.sql.execution.SparkPlan.newOrdering(SparkPlan.scala:363)
    at org.apache.spark.sql.execution.SortExec.createSorter(SortExec.scala:63)
    at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:102)
    at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:101)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.StackOverflowError
    at org.codehaus.janino.CodeContext.extract16BitValue(CodeContext.java:679)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:475)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
    at org.codehaus.janino.CodeContext.flowAnalysis(CodeC

0 个答案:

没有答案