在YARN

时间:2018-03-20 11:49:27

标签: cloudera cloudera-manager

我们在YARN上遇到了多个Bad Health状态邮件,这个邮件持续了将近48小时,每隔3分钟发出一次警报,并且无法看到相同背后的任何具体原因

仅在3月19日发出警报时才注意到FATAL错误

  

2018-03-19 02:30:18,642 INFO   org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:   application_1521236868036_1919状态从NEW_SAVING更改为   提交2018-03-19 02:30:18,704致命   org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:错误   在处理事件类型APP_ADDED到调度程序   com.google.common.util.concurrent.ExecutionError:   java.lang.OutOfMemoryError:无法在。创建新的本机线程   com.google.common.cache.LocalCache $ Segment.get(LocalCache.java:2232)     在com.google.common.cache.LocalCache.get(LocalCache.java:3965)at   com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)at at   com.google.common.cache.LocalCache $ LocalManualCache.get(LocalCache.java:4829)     在org.apache.hadoop.security.Groups.getGroups(Groups.java:215)at   org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule $ SecondaryGroupExistingQueue.getQueueForApp(QueuePlacementRule.java:173)     在   org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementRule.assignAppToQueue(QueuePlacementRule.java:74)     在   org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueuePlacementPolicy.assignAppToQueue(QueuePlacementPolicy.java:167)     在   org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:732)     在   org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:638)     在   org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1250)     在   org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)     在   org.apache.hadoop.yarn.server.resourcemanager.ResourceManager $ SchedulerEventDispatcher $ EventProcessor.run(ResourceManager.java:700)     在java.lang.Thread.run(Thread.java:745)引起:   java.lang.OutOfMemoryError:无法在。创建新的本机线程   java.lang.Thread.start0(Native Method)at   java.lang.Thread.start(Thread.java:714)at   org.apache.hadoop.util.Shell.runCommand(Shell.java:584)at   org.apache.hadoop.util.Shell.run(Shell.java:504)at   org.apache.hadoop.util.Shell $ ShellCommandExecutor.execute(Shell.java:786)     在   org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:129)     在   org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:72)     在   org.apache.hadoop.security.Groups $ GroupCacheLoader.fetchGroupList(Groups.java:356)     在   org.apache.hadoop.security.Groups $ GroupCacheLoader.load(Groups.java:299)     在   org.apache.hadoop.security.Groups $ GroupCacheLoader.load(Groups.java:257)     在   com.google.common.cache.LocalCache $ LoadingValueReference.loadFuture(LocalCache.java:3568)     在   com.google.common.cache.LocalCache $ Segment.loadSync(LocalCache.java:2350)     在   com.google.common.cache.LocalCache $ Segment.lockedGetOrLoad(LocalCache.java:2313)     在   com.google.common.cache.LocalCache $ Segment.get(LocalCache.java:2228)     ......还有13个

0 个答案:

没有答案