如何理解纱线appattempt日志和诊断?

时间:2017-03-04 23:02:57

标签: apache-spark yarn

我正在YARN群集上运行spark应用程序(在AWS EMR上)。该应用程序似乎被杀死,我想找到原因。我尝试了解以下屏幕中给出的YARN信息。 enter image description here 屏幕中的诊断线似乎显示YARN因内存限制而终止应用程序:

Diagnostics: Container [pid=1540,containerID=container_1488651686158_0012_02_000001] is running beyond physical memory limits. Current usage: 1.6 GB of 1.4 GB physical memory used; 3.6 GB of 6.9 GB virtual memory used. Killing container

但是,appattempt日志显示完全不同的异常,与IO /网络相关。我的问题是:我应该相信屏幕上的诊断还是appattempt日志?导致kill或内存不足的IO异常是否导致appattempt日志中的IO异常?这是我应该看的另一个日志/诊断吗?谢谢。

17/03/04 21:59:02 ERROR Utils: Uncaught exception in thread task-result-getter-0
java.lang.InterruptedException
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
        at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:190)
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:190)
        at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:104)
        at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:579)
        at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82)
        at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
        at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
        at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Exception in thread "task-result-getter-0" java.lang.Error: java.lang.InterruptedException
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1148)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
        at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:190)
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:190)
        at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:104)
        at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:579)
        at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82)
        at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
        at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
        at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        ... 2 more
17/03/04 21:59:02 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/03/04 21:59:02 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from ip-172-31-9-207.ec2.internal/172.31.9.207:38437 is closed
17/03/04 21:59:02 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1 outstanding blocks after 5000 ms
17/03/04 21:59:02 ERROR DiskBlockManager: Exception while deleting local spark dir: /mnt/yarn/usercache/hadoop/appcache/application_1488651686158_0012/blockmgr-941a13d8-1b31-4347-bdec-180125b6f4ca
java.io.IOException: Failed to delete: /mnt/yarn/usercache/hadoop/appcache/application_1488651686158_0012/blockmgr-941a13d8-1b31-4347-bdec-180125b6f4ca
        at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
        at org.apache.spark.storage.DiskBlockManager$$anonfun$org$apache$spark$storage$DiskBlockManager$$doStop$1.apply(DiskBlockManager.scala:169)
        at org.apache.spark.storage.DiskBlockManager$$anonfun$org$apache$spark$storage$DiskBlockManager$$doStop$1.apply(DiskBlockManager.scala:165)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:165)
        at org.apache.spark.storage.DiskBlockManager.stop(DiskBlockManager.scala:160)
        at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1361)
        at org.apache.spark.SparkEnv.stop(SparkEnv.scala:89)
        at org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1842)
        at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283)
        at org.apache.spark.SparkContext.stop(SparkContext.scala:1841)
        at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581)
        at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
        at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
17/03/04 21:59:02 INFO MemoryStore: MemoryStore cleared
17/03/04 21:59:02 INFO BlockManager: BlockManager stopped
17/03/04 21:59:02 INFO BlockManagerMaster: BlockManagerMaster stopped
17/03/04 21:59:02 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/03/04 21:59:02 ERROR Utils: Uncaught exception in thread Thread-3
java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder
        at java.nio.file.FileSystems.getDefault(FileSystems.java:176)
        at java.nio.file.Paths.get(Paths.java:138)
        at org.apache.spark.util.Utils$.isSymlink(Utils.scala:1021)
        at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:991)
        at org.apache.spark.SparkEnv.stop(SparkEnv.scala:102)
        at org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1842)
        at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283)
        at org.apache.spark.SparkContext.stop(SparkContext.scala:1841)
        at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581)
        at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
        at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
17/03/04 21:59:02 WARN ShutdownHookManager: ShutdownHook '$anon$2' failed, java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder
java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder
        at java.nio.file.FileSystems.getDefault(FileSystems.java:176)
        at java.nio.file.Paths.get(Paths.java:138)
        at org.apache.spark.util.Utils$.isSymlink(Utils.scala:1021)
        at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:991)
        at org.apache.spark.SparkEnv.stop(SparkEnv.scala:102)
        at org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1842)
        at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283)
        at org.apache.spark.SparkContext.stop(SparkContext.scala:1841)
        at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581)
        at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
        at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

2 个答案:

答案 0 :(得分:0)

容器被终止(内存超过物理内存限制),因此任何到达此容器的尝试都会失败。

纱线可以全面了解流程,但您应该更喜欢使用spark历史记录服务器来更好地分析您的工作(检查火花历史记录中的不平衡内存)。

a well balanced memory stage

答案 1 :(得分:0)

屏幕截图中的信息最相关。您的ApplicationMaster容器内存不足。您需要增加mapred-site.xml中设置的yarn.app.mapreduce.am.resource.mb。我推荐使用2000的值,因为它通常可以适应大规模运行Spark和MapReduce应用程序。