调试教程Hadoop Pipes-Project

时间:2013-06-09 18:28:32

标签: hadoop mapreduce

我正在研究tutorial 并到了最后一部分(有一些小的变化)。 现在我遇到了一个我无法理解的错误信息。

damian@damian-ThinkPad-T61:~/hadoop-1.1.2$ bin/hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -input dft1 -output dft1-out -program bin/word_count

13/06/09 20:17:01 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/06/09 20:17:01 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/06/09 20:17:01 WARN snappy.LoadSnappy: Snappy native library not loaded
13/06/09 20:17:01 INFO mapred.FileInputFormat: Total input paths to process : 1
13/06/09 20:17:02 INFO filecache.TrackerDistributedCacheManager: Creating word_count in /tmp/hadoop-damian/mapred/local/archive/7642618178782392982_1522484642_696507214/filebin-work-1867423021697266227 with rwxr-xr-x
13/06/09 20:17:02 INFO filecache.TrackerDistributedCacheManager: Cached bin/word_count as /tmp/hadoop-damian/mapred/local/archive/7642618178782392982_1522484642_696507214/filebin/word_count
13/06/09 20:17:02 INFO filecache.TrackerDistributedCacheManager: Cached bin/word_count as /tmp/hadoop-damian/mapred/local/archive/7642618178782392982_1522484642_696507214/filebin/word_count
13/06/09 20:17:02 INFO mapred.JobClient: Running job: job_local_0001
13/06/09 20:17:02 INFO util.ProcessTree: setsid exited with exit code 0
13/06/09 20:17:02 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4200d3
13/06/09 20:17:02 INFO mapred.MapTask: numReduceTasks: 1
13/06/09 20:17:02 INFO mapred.MapTask: io.sort.mb = 100
13/06/09 20:17:02 INFO mapred.MapTask: data buffer = 79691776/99614720
13/06/09 20:17:02 INFO mapred.MapTask: record buffer = 262144/327680
13/06/09 20:17:02 WARN mapred.LocalJobRunner: job_local_0001
java.lang.NullPointerException
    at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:103)
    at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:68)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
13/06/09 20:17:03 INFO mapred.JobClient:  map 0% reduce 0%
13/06/09 20:17:03 INFO mapred.JobClient: Job complete: job_local_0001
13/06/09 20:17:03 INFO mapred.JobClient: Counters: 0
13/06/09 20:17:03 INFO mapred.JobClient: Job Failed: NA
Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1327)
    at org.apache.hadoop.mapred.pipes.Submitter.runJob(Submitter.java:248)
    at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:479)
    at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:494)

有人看到错误隐藏在哪里吗?调试Hadoop管道程序的简单方法是什么?

谢谢!

2 个答案:

答案 0 :(得分:1)

例外:

at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:103)

由源中的以下行引起:

//Add token to the environment if security is enabled
Token<JobTokenIdentifier> jobToken = TokenCache.getJobToken(conf
    .getCredentials());
// This password is used as shared secret key between this application and
// child pipes process
byte[]  password = jobToken.getPassword();

实际的NPE是最后一行,因为jobToken为空。

当您使用本地模式(本地作业跟踪器和本地文件系统)时,我不确定应该“启用”安全性 - 您是否在core-site.xml中配置了以下任一属性,或hdfs-site.xml配置文件(如果是,它们的值是什么):

  • hadoop.security.authentication
  • hadoop.security.authorization

答案 1 :(得分:1)

可能是因为您的群集在本地模式下运行。您的mapred-site.xml文件中是否包含以下属性?

 <property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
   <description>
    Let the MapReduce jobs run with the yarn framework.
   </description>
 </property>

如果您没有此属性,则默认情况下,您的群集将以本地模式运行。我曾经在本地模式下遇到完全相同的问题。添加此属性后,群集将以分布式模式运行,问题将消失。

HTH,

Shumin