Spark错误:executor.CoarseGrainedExecutorBackend:RECEIVED SIGNAL TERM

时间:2017-12-20 13:51:01

标签: scala apache-spark

我正在使用以下spark配置

maxCores = 5
 driverMemory=2g
 executorMemory=17g
 executorInstances=100

问题: 在100个执行程序中,我的工作最终只有10个活动执行程序,但仍有足够的内存可用。即使尝试将执行程序设置为250,仍然只有10个仍处于活动状态。我所要做的就是加载多个分区配置单元表并对其执行df.count。

Please help me understanding the issue causing the executors kill
17/12/20 11:08:21 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
17/12/20 11:08:21 INFO storage.DiskBlockManager: Shutdown hook called
17/12/20 11:08:21 INFO util.ShutdownHookManager: Shutdown hook called

不确定为什么纱线会杀死我的遗嘱执行人。

3 个答案:

答案 0 :(得分:2)

我面临类似的问题,NodeManager-Logs的调查引导我找到根本原因。 您可以通过Web界面访问它们

{'.\\foo.txt': 'This is the file content'}

PORT yarn.nodemanager.webapp.address 下的 yarn-site.xml 中指定。 (默认: 8042

我的调查 - 工作流程:

  1. 收集日志(纱线日志......命令)
  2. 识别发出错误
  3. 的节点和容器(在这些日志中)
  4. 错误的时间戳搜索NodeManager日志以查找根本原因
  5. 顺便说一句:您可以访问影响同一端口节点的所有配置的聚合集合(xml):

    nodeManagerAddress:PORT/logs
    

答案 1 :(得分:1)

我相信这个问题与执行器/容器级别的内存和动态时间分配有更多关系。确保可以在执行者/容器级别上更改配置参数。

解决此问题的方法之一是通过在spark-shell或spark作业上更改此配置值。

spark.dynamicAllocation.executorIdleTimeout

此主题包含有关如何解决此问题的更详细的信息,该信息对我有用: https://jira.apache.org/jira/browse/SPARK-21733

答案 2 :(得分:-1)

@maffe , I having same issue , not able to figure out the issue .... attaching log , can you help me.     ' 2019-01-07 05:36:51 INFO  MapOutputTrackerWorker:54 - Don't have map outputs for shuffle 3, fetching them
2019-01-07 05:36:51 INFO  MapOutputTrackerWorker:54 - Don't have map outputs for shuffle 3, fetching them
2019-01-07 05:36:51 INFO  MapOutputTrackerWorker:54 - Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@nj11mhf0068:36051)
2019-01-07 05:36:51 INFO  MapOutputTrackerWorker:54 - Got the output locations
2019-01-07 05:36:51 INFO  ShuffleBlockFetcherIterator:54 - Getting 94 non-empty blocks out of 94 blocks
2019-01-07 05:36:51 INFO  ShuffleBlockFetcherIterator:54 - Getting 94 non-empty blocks out of 94 blocks
2019-01-07 05:36:51 INFO  ShuffleBlockFetcherIterator:54 - Getting 94 non-empty blocks out of 94 blocks
2019-01-07 05:36:51 INFO  TransportClientFactory:267 - Successfully created connection to nj12mhf0206.mhf.mhc/10.180.19.157:7337 after 1 ms (0 ms spent in bootstraps)
2019-01-07 05:36:51 INFO  TransportClientFactory:267 - Successfully created connection to nj11mhf0062/10.191.18.59:7337 after 1 ms (0 ms spent in bootstraps)
2019-01-07 05:36:51 INFO  TransportClientFactory:267 - Successfully created connection to nj11mhf0059/10.191.18.56:7337 after 0 ms (0 ms spent in bootstraps)
2019-01-07 05:36:51 INFO  TransportClientFactory:267 - Successfully created connection to nj12mhf0205.mhf.mhc/10.180.21.236:7337 after 2 ms (0 ms spent in bootstraps)
2019-01-07 05:36:51 INFO  ShuffleBlockFetcherIterator:54 - Started 7 remote fetches in 22 ms
2019-01-07 05:36:51 INFO  ShuffleBlockFetcherIterator:54 - Started 6 remote fetches in 23 ms
2019-01-07 05:36:51 INFO  TransportClientFactory:267 - Successfully created connection to nj11mhf0064/10.191.18.61:7337 after 9 ms (0 ms spent in bootstraps)
2019-01-07 05:36:51 INFO  ShuffleBlockFetcherIterator:54 - Started 6 remote fetches in 33 ms
2019-01-07 05:36:51 INFO  CodeGenerator:54 - Code generated in 383.607808 ms
2019-01-07 05:36:51 INFO  CodeGenerator:54 - Code generated in 18.218092 ms
2019-01-07 05:36:52 INFO  UnsafeExternalSorter:209 - Thread 87 spilling sort data of 482.0 MB to disk (0  time so far)
2019-01-07 05:36:52 INFO  UnsafeExternalSorter:209 - Thread 85 spilling sort data of 482.0 MB to disk (0  time so far)
2019-01-07 05:36:53 INFO  UnsafeExternalSorter:209 - Thread 86 spilling sort data of 482.0 MB to disk (0  time so far)
2019-01-07 05:36:54 INFO  UnsafeExternalSorter:209 - Thread 87 spilling sort data of 482.0 MB to disk (1  time so far)
2019-01-07 05:36:54 INFO  UnsafeExternalSorter:209 - Thread 85 spilling sort data of 482.0 MB to disk (1  time so far)
2019-01-07 05:36:54 INFO  UnsafeExternalSorter:209 - Thread 86 spilling sort data of 482.0 MB to disk (1  time so far)
2019-01-07 05:36:55 INFO  UnsafeExternalSorter:209 - Thread 87 spilling sort data of 482.0 MB to disk (2  times so far)
2019-01-07 05:36:55 INFO  UnsafeExternalSorter:209 - Thread 85 spilling sort data of 482.0 MB to disk (2  times so far)
2019-01-07 05:36:56 INFO  UnsafeExternalSorter:209 - Thread 86 spilling sort data of 482.0 MB to disk (2  times so far)
2019-01-07 05:36:56 INFO  UnsafeExternalSorter:209 - Thread 87 spilling sort data of 482.0 MB to disk (3  times so far)
2019-01-07 05:36:57 INFO  UnsafeExternalSorter:209 - Thread 85 spilling sort data of 482.0 MB to disk (3  times so far)
2019-01-07 05:36:57 INFO  UnsafeExternalSorter:209 - Thread 86 spilling sort data of 482.0 MB to disk (3  times so far)
2019-01-07 05:36:58 INFO  UnsafeExternalSorter:209 - Thread 87 spilling sort data of 482.0 MB to disk (4  times so far)
2019-01-07 05:36:58 INFO  UnsafeExternalSorter:209 - Thread 85 spilling sort data of 482.0 MB to disk (4  times so far)
2019-01-07 05:36:58 INFO  UnsafeExternalSorter:209 - Thread 86 spilling sort data of 482.0 MB to disk (4  times so far)
2019-01-07 05:36:59 INFO  UnsafeExternalSorter:209 - Thread 87 spilling sort data of 482.0 MB to disk (5  times so far)
2019-01-07 05:36:59 INFO  UnsafeExternalSorter:209 - Thread 85 spilling sort data of 482.0 MB to disk (5  times so far)
2019-01-07 05:36:59 INFO  UnsafeExternalSorter:209 - Thread 86 spilling sort data of 482.0 MB to disk (5  times so far)
2019-01-07 05:37:00 INFO  CodeGenerator:54 - Code generated in 451.535709 ms
2019-01-07 05:37:01 INFO  CodeGenerator:54 - Code generated in 381.816503 ms
2019-01-07 05:37:02 INFO  CodeGenerator:54 - Code generated in 13.543245 ms
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill %p"
#   Executing /bin/sh -c "kill 38184"...
2019-01-07 05:37:02 ERROR CoarseGrainedExecutorBackend:43 - RECEIVED SIGNAL TERM
2019-01-07 05:37:02 INFO  DiskBlockManager:54 - Shutdown hook called
2019-01-07 05:37:02 INFO  ShutdownHookManager:54 - Shutdown hook called '