Question

我正在AWS EMR集群上运行spark-submit应用程序（EMR 5.0.0，Spark 2.0.0,30 r3.4xlarge）。要启动脚本，我将SSH连接到主节点，然后运行以下命令：

hm <- strftime(as.POSIXct(tracks$V1, format="%m/%d/%Y %H:%M"), "%H:%M")
tracks <- tracks["06:00" < hm & hm < "18:00",]
##                                      V1       V2       V3
##1                       05/04/2015 16:04 53.38540 -6.29421
##2                       05/04/2015 17:17 53.38464 -6.29412
##3                       05/04/2015 17:33 53.38457 -6.29409
##4                       05/04/2015 17:49 53.38463 -6.29418
##9                       06/04/2015 07:13 53.38459 -6.29414
##10                      06/04/2015 08:30 53.38460 -6.29414
##11                      06/04/2015 16:56 53.38458 -6.29413
##12                      06/04/2015 17:05 53.38469 -6.29416
##13                      06/04/2015 17:13 53.38464 -6.29409
##14                      06/04/2015 17:26 53.38463 -6.29412
##15                      06/04/2015 17:39 53.38463 -6.29411

该应用程序使用默认的AWS spark配置，它具有spark.master = yarn和deploy-mode = client。

应用程序加载~220GB的数据，进行类似SQL的聚合，然后写入s3。书面数据看起来像是正确处理的。代码运行时，我看到一个错误消息，但代码继续运行：

time spark-submit --conf spark.sql.shuffle.partitions=5000 \
--conf spark.memory.storageFraction=0.3 --conf spark.memory.fraction=0.95 \
--executor-memory 8G --driver-memory 10G dataframe_script.py

应用程序完成写入后，应用程序不会返回命令行> 10分钟，发出警告：

ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue. This likely means one of the SparkListeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler.

然后成千上万行有错误信息：

WARN ExecutorAllocationManager: No stages are running, but numRunningTasks != 0

进度条也会继续在错误消息之间移动，例如：

16/10/12 00:40:03 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(176,WrappedArray())

我的代码用于写入和结束主要步骤：

[Stage 17:=================================================>   (465 + 35) / 500]

有一个previous StackOverflow question，它指的是this JIRA。看起来修复了旧版本的Spark，但我不太清楚问题是什么。

如何避免这些错误消息？

Answer 1

我想我发现了这个问题。在我的Spark脚本中，我在main（）函数之外启动SparkContext，但在main函数内停止它。当脚本退出并尝试第二次关闭SparkContext时，这会导致问题。通过在main函数中移动SparkContext初始化，大多数错误都消失了。

防止SparkListenerBus错误

1 个答案: