Hadoop字数计数示例 - 空指针异常

时间:2016-09-02 10:53:23

标签: java eclipse hadoop

我是Hadoop的初学者。我的设置:RHEL7,hadoop-2.7.3

我试图运行Example:_WordCount_v2.0。我刚刚将源代码复制到新的eclipse项目并将其导出到wc.jar文件。

现在,我已将hadoop Pseudo-Distributed Operation配置为在链接中显示。然后我从以下开始:

在输入目录中创建输入文件:

echo "Hello World, Bye World!" > input/file01
echo "Hello Hadoop, Goodbye to hadoop." > input/file02

开始环境:

sbin/start-dfs.sh
bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/<username>
bin/hdfs dfs -put input input
bin/hadoop jar ws.jar WordCount2 input output

这就是我得到的:

16/09/02 13:15:01 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
16/09/02 13:15:01 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
16/09/02 13:15:01 INFO input.FileInputFormat: Total input paths to process : 2
16/09/02 13:15:01 INFO mapreduce.JobSubmitter: number of splits:2
16/09/02 13:15:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local455553963_0001
16/09/02 13:15:01 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/09/02 13:15:01 INFO mapreduce.Job: Running job: job_local455553963_0001
16/09/02 13:15:01 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/09/02 13:15:01 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/09/02 13:15:01 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
16/09/02 13:15:02 INFO mapred.LocalJobRunner: Waiting for map tasks
16/09/02 13:15:02 INFO mapred.LocalJobRunner: Starting task: attempt_local455553963_0001_m_000000_0
16/09/02 13:15:02 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/09/02 13:15:02 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
16/09/02 13:15:02 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/aii/input/file02:0+33
16/09/02 13:15:02 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/09/02 13:15:02 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/09/02 13:15:02 INFO mapred.MapTask: soft limit at 83886080
16/09/02 13:15:02 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/09/02 13:15:02 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/09/02 13:15:02 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/09/02 13:15:02 INFO mapred.MapTask: Starting flush of map output
16/09/02 13:15:02 INFO mapred.LocalJobRunner: Starting task: attempt_local455553963_0001_m_000001_0
16/09/02 13:15:02 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/09/02 13:15:02 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
16/09/02 13:15:02 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/aii/input/file01:0+24
16/09/02 13:15:02 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/09/02 13:15:02 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/09/02 13:15:02 INFO mapred.MapTask: soft limit at 83886080
16/09/02 13:15:02 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/09/02 13:15:02 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/09/02 13:15:02 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/09/02 13:15:02 INFO mapred.MapTask: Starting flush of map output
16/09/02 13:15:02 INFO mapred.LocalJobRunner: map task executor complete.
16/09/02 13:15:02 WARN mapred.LocalJobRunner: job_local455553963_0001
java.lang.Exception: java.lang.NullPointerException
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.NullPointerException
    at WordCount2$TokenizerMapper.setup(WordCount2.java:47)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
16/09/02 13:15:02 INFO mapreduce.Job: Job job_local455553963_0001 running in uber mode : false
16/09/02 13:15:02 INFO mapreduce.Job:  map 0% reduce 0%
16/09/02 13:15:02 INFO mapreduce.Job: Job job_local455553963_0001 failed with state FAILED due to: NA
16/09/02 13:15:02 INFO mapreduce.Job: Counters: 0

没有给出结果(输出)。为什么我得到那个例外?

由于

修改

感谢建议的解决方案我已经意识到第二次尝试(在wordCount示例中):

echo "\." > patterns.txt
echo "\," >> patterns.txt
echo "\!" >> patterns.txt
echo "to" >> patterns.txt

然后运行:

bin/hadoop jar ws.jar WordCount2 -Dwordcount.case.sensitive=true input output -skip patterns.txt

一切都很棒!

2 个答案:

答案 0 :(得分:2)

问题出现在mapper的setup()方法中。此wordcount示例比平常更高级,允许您指定包含映射器将过滤掉的模式的文件。此文件将添加到main()方法的缓存中,以便在每个节点上都可用于打开映射器。

您可以在main()中看到要添加到缓存中的文件:

for (int i=0; i < remainingArgs.length; ++i) {
    if ("-skip".equals(remainingArgs[i])) {
        job.addCacheFile(new Path(remainingArgs[++i]).toUri());
        job.getConfiguration().setBoolean("wordcount.skip.patterns", true);
    } else {
        otherArgs.add(remainingArgs[i]);
    }
}

您没有指定-skip选项,因此不会尝试添加任何内容。如果添加了文件,您可以看到它将wordcount.skip.patterns设置为true

在地图集setup()中,您有以下代码:

 @Override
 public void setup(Context context) throws IOException, InterruptedException {
     conf = context.getConfiguration();
     caseSensitive = conf.getBoolean("wordcount.case.sensitive", true);
     if (conf.getBoolean("wordcount.skip.patterns", true)) {
         URI[] patternsURIs = Job.getInstance(conf).getCacheFiles();
         for (URI patternsURI : patternsURIs) {
             Path patternsPath = new Path(patternsURI.getPath());
             String patternsFileName = patternsPath.getName().toString();
             parseSkipFile(patternsFileName);
         }
     }
}

问题是此检查conf.getBoolean("wordcount.skip.patterns", true)默认为true,如果未设置,则在您的情况下不会。因此patternsURIs或其周围的东西(我没有行号)将为空。

因此,您可以将wordcount.case.sensitive更改为默认为false,在驱动程序(主方法)中将其设置为false,或提供跳过文件来修复它。

答案 1 :(得分:1)

问题可能是代码的这一部分:

 caseSensitive = conf.getBoolean("wordcount.case.sensitive", true);
 if (conf.getBoolean("wordcount.skip.patterns", true)) {
     URI[] patternsURIs = Job.getInstance(conf).getCacheFiles();
     for (URI patternsURI : patternsURIs) {
         Path patternsPath = new Path(patternsURI.getPath());
         String patternsFileName = patternsPath.getName().toString();
         parseSkipFile(patternsFileName);
     }
 }

此处getCacheFiles()因任何原因返回null。这就是为什么当你试图迭代patternsURIs(除null之外什么都没有)时,你会遇到异常。

要解决此问题,请在开始循环之前检查patternsURIs是否为空。

if(patternsURIs != null) {
    for (URI patternsURI : patternsURIs) {  
      Path patternsPath = new Path(patternsURI.getPath());
      String patternsFileName = patternsPath.getName().toString();
      parseSkipFile(patternsFileName);
    }
}

如果不希望获得null,您还应该检查为什么会收到null