SparkR org.apache.spark.SparkException:R worker意外退出

时间:2018-05-02 11:53:50

标签: apache-spark sparkr databricks

我正在尝试执行SparkR gapply,基本上当我尝试运行此操作时,我的输入文件限制为大约300k行,但是扩展到大约1.2m行我得到以下重复异常许多执行任务中的stderr - 大约70%的任务完成而其他任务失败或被杀死。失败的输出具有相同的错误输出:

org.apache.spark.SparkException: R worker exited unexpectedly (cranshed)
    at org.apache.spark.api.r.RRunner.org$apache$spark$api$r$RRunner$$read(RRunner.scala:240)
    at org.apache.spark.api.r.RRunner$$anon$1.next(RRunner.scala:91)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:346)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:392)
    at org.apache.spark.api.r.RRunner.org$apache$spark$api$r$RRunner$$read(RRunner.scala:212)
    ... 16 more

除了分配更多内存之外,还需要考虑哪些调整参数?我相信SparkR并不像PySpark或Scala那样广泛使用,有时它们的调整参数可能会有所不同,所以我们非常感谢这里的任何帮助。

这是在Databricks / AWS集群上运行 - 20个工作节点,30.5 GB内存,每个4 Core。

在我们的用例中,gapply函数最多运行10行 - 数据帧,最多20列分成4个R数据帧,然后使用R包NlcOptim,quadprog将其送入线性优化求解器。 / p>

1 个答案:

答案 0 :(得分:0)

使用.cache()并再次尝试解决此问题。