Spark错误:268个任务的序列化结果的总大小(1442.5 MB)大于spark.driver.maxResultSize(1024.0 MB)

时间:2018-08-13 06:47:46

标签: apache-spark

我正在运行Spark作业,并且在这里遇到以下错误。我已经尝试了很多优化

现在,代码抛出内存不足错误。我增加了分区。我正在尝试在Eclipse中运行Spark程序,两个表每个包含400万条记录。我正在尝试加入并导出到单个JSON文件中。当我运行程序时,20分钟后我收到OutOfMemory错误。

      val conf = new SparkConf().setAppName("SQLtoJSON").setMaster("local[*]")
      .set("spark.executor.memory", "7g")
      .set("spark.driver.memory", "8g")
      .set("spark.executor.cores", "4")
  //  .set("spark.driver.cores", "4")
      .set("spark.testing.memory", "2147480000")
        .set("spark.sql.shuffle.partitions", "2000")

错误:

18/08/13 12:07:07 INFO DAGScheduler: ResultStage 94 (show at SparkApplication.scala:132) failed in 54.206 s due to Job aborted due to stage failure: Total size of serialized results of 268 tasks (1442.5 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
18/08/13 12:07:07 INFO DAGScheduler: Job 12 failed: show at SparkApplication.scala:132, took 754.916325 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 268 tasks (1442.5 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1517)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1505)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1504)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

0 个答案:

没有答案