我的一些ADF作业随机失败,输出指向下面/ PackageJobs / ~job / Status / stderr文件中的数据。
请注意,这并不总是会发生,它会在某些作业中随机出现,而其他作业则会正常完成。
可能导致此问题的原因是什么?
stderr数据如下:
log4j:ERROR Could not instantiate class [com.microsoft.log4jappender.FilterLogAppender].
java.lang.ClassNotFoundException: com.microsoft.log4jappender.FilterLogAppender
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.log4j.helpers.Loader.loadClass(Loader.java:198)
at org.apache.log4j.helpers.OptionConverter.instantiateByClassName(OptionConverter.java:327)
at org.apache.log4j.helpers.OptionConverter.instantiateByKey(OptionConverter.java:124)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:785)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.apache.log4j.Logger.getLogger(Logger.java:104)
at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:262)
at org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:108)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1025)
at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:844)
at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:541)
at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:292)
at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:269)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657)
at org.apache.hadoop.util.ShutdownHookManager.<clinit>(ShutdownHookManager.java:44)
at org.apache.hadoop.util.RunJar.run(RunJar.java:200)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
log4j:ERROR Could not instantiate appender named "RMSUMFilterLog".
16/03/04 10:56:02 INFO impl.TimelineClientImpl: Timeline service address: http://headnodehost:8188/ws/v1/timeline/
16/03/04 10:56:02 INFO client.RMProxy: Connecting to ResourceManager at headnodehost/100.74.24.3:9010
16/03/04 10:56:02 INFO client.AHSProxy: Connecting to Application History server at headnodehost/100.74.24.3:10200
16/03/04 10:56:03 INFO impl.TimelineClientImpl: Timeline service address: http://headnodehost:8188/ws/v1/timeline/
16/03/04 10:56:03 INFO client.RMProxy: Connecting to ResourceManager at headnodehost/100.74.24.3:9010
16/03/04 10:56:03 INFO client.AHSProxy: Connecting to Application History server at headnodehost/100.74.24.3:10200
16/03/04 10:56:06 INFO mapred.FileInputFormat: Total input paths to process : 1
16/03/04 10:56:06 INFO mapreduce.JobSubmitter: number of splits:1
16/03/04 10:56:06 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
16/03/04 10:56:06 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/03/04 10:56:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1457068773628_0022
16/03/04 10:56:07 INFO mapreduce.JobSubmitter: Kind: mapreduce.job, Service: job_1457068773628_0019, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@655019bc)
16/03/04 10:56:08 INFO impl.YarnClientImpl: Submitted application application_1457068773628_0022
16/03/04 10:56:08 INFO mapreduce.Job: The url to track the job: http://headnodehost:9014/proxy/application_1457068773628_0022/
16/03/04 10:56:08 INFO mapreduce.Job: Running job: job_1457068773628_0022
16/03/04 10:56:18 INFO mapreduce.Job: Job job_1457068773628_0022 running in uber mode : false
16/03/04 10:56:18 INFO mapreduce.Job: map 0% reduce 0%
16/03/04 10:56:31 INFO mapreduce.Job: map 100% reduce 0%
16/03/04 23:48:59 INFO mapreduce.Job: Task Id : attempt_1457068773628_0022_m_000000_0, Status : FAILED
AttemptID:attempt_1457068773628_0022_m_000000_0 Timed out after 600 secs
16/03/04 23:49:00 INFO mapreduce.Job: map 0% reduce 0%
16/03/04 23:49:16 INFO mapreduce.Job: map 100% reduce 0%
16/03/05 00:01:00 INFO mapreduce.Job: Task Id : attempt_1457068773628_0022_m_000000_1, Status : FAILED
AttemptID:attempt_1457068773628_0022_m_000000_1 Timed out after 600 secs
16/03/05 00:01:01 INFO mapreduce.Job: map 0% reduce 0%
16/03/05 00:01:21 INFO mapreduce.Job: map 100% reduce 0%
16/03/05 00:13:00 INFO mapreduce.Job: Task Id : attempt_1457068773628_0022_m_000000_2, Status : FAILED
AttemptID:attempt_1457068773628_0022_m_000000_2 Timed out after 600 secs
16/03/05 00:13:01 INFO mapreduce.Job: map 0% reduce 0%
16/03/05 00:13:18 INFO mapreduce.Job: map 100% reduce 0%
16/03/05 00:25:03 INFO mapreduce.Job: Job job_1457068773628_0022 failed with state FAILED due to: Task failed task_1457068773628_0022_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
16/03/05 00:25:03 INFO mapreduce.Job: Counters: 9
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=3
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=48514665
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=48514665
Total vcore-seconds taken by all map tasks=48514665
Total megabyte-seconds taken by all map tasks=74518525440
16/03/05 00:25:03 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
答案 0 :(得分:0)
这看起来像Hadoop / HDI的已知超时问题。如果一个活动没有在控制台上写任何东西10分钟,那么它就会被杀死。你可以修改你的代码,每9分钟在控制台上写一次ping,看它是否有效