Question

我正在运行Spark作业，并且在这里遇到以下错误。我已经尝试了很多优化

现在，代码抛出内存不足错误。我增加了分区。我正在尝试在Eclipse中运行Spark程序，两个表每个包含400万条记录。我正在尝试加入并导出到单个JSON文件中。当我运行程序时，20分钟后我收到OutOfMemory错误。

      val conf = new SparkConf().setAppName("SQLtoJSON").setMaster("local[*]")
      .set("spark.executor.memory", "7g")
      .set("spark.driver.memory", "8g")
      .set("spark.executor.cores", "4")
  //  .set("spark.driver.cores", "4")
      .set("spark.testing.memory", "2147480000")
        .set("spark.sql.shuffle.partitions", "2000")

错误：

18/08/13 12:07:07 INFO DAGScheduler: ResultStage 94 (show at SparkApplication.scala:132) failed in 54.206 s due to Job aborted due to stage failure: Total size of serialized results of 268 tasks (1442.5 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
18/08/13 12:07:07 INFO DAGScheduler: Job 12 failed: show at SparkApplication.scala:132, took 754.916325 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 268 tasks (1442.5 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1517)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1505)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1504)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

Spark错误：268个任务的序列化结果的总大小（1442.5 MB）大于spark.driver.maxResultSize（1024.0 MB）

0 个答案: