我正在运行Spark作业,并且在这里遇到以下错误。我已经尝试了很多优化
现在,代码抛出内存不足错误。我增加了分区。我正在尝试在Eclipse中运行Spark程序,两个表每个包含400万条记录。我正在尝试加入并导出到单个JSON文件中。当我运行程序时,20分钟后我收到OutOfMemory错误。
val conf = new SparkConf().setAppName("SQLtoJSON").setMaster("local[*]")
.set("spark.executor.memory", "7g")
.set("spark.driver.memory", "8g")
.set("spark.executor.cores", "4")
// .set("spark.driver.cores", "4")
.set("spark.testing.memory", "2147480000")
.set("spark.sql.shuffle.partitions", "2000")
错误:
18/08/13 12:07:07 INFO DAGScheduler: ResultStage 94 (show at SparkApplication.scala:132) failed in 54.206 s due to Job aborted due to stage failure: Total size of serialized results of 268 tasks (1442.5 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
18/08/13 12:07:07 INFO DAGScheduler: Job 12 failed: show at SparkApplication.scala:132, took 754.916325 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 268 tasks (1442.5 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1517)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1505)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1504)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)