我是新手来减少编程,并以简单的字数计数示例开始我的课程。但是,我正在尝试一种不同的方法。我的hdfs输入文件夹上有两个输入文件。我正在尝试生成类似
的输出anyword1 --> filename1 2
anyword2 --> filename2 3
我编写了一个mapper类,用于在Key处将单词和文件名连接在一起,但是当我在Text中设置键值时,它会抛出空指针异常。有人可以提供帮助并告知我在哪里做错了。
我的映射器类
public static class TokenizerMapper
extends Mapper<Object, Text, Text,IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = null;
private String fileText = null;
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
String modifiedWord ="";
fileName = "-->"+fileName;
System.out.println("filename before word-->"+fileName);
while (itr.hasMoreTokens()) {
modifiedWord = itr.nextToken().toString();//+fileName;
modifiedWord = modifiedWord + fileName;
System.out.println("modified word-->"+modifiedWord);
word.set(modifiedWord);
context.write(word, one);
System.out.println("Mapper context-->"+word);
}
}
}
------执行----
[root@LinuxCentos7 hadoop]# hadoop jar /usr/local/mapreduceexample/WordCountEx3.jar /user/Siddharth/Input /user/Siddharth/output
17/06/09 23:32:29 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/06/09 23:32:32 INFO input.FileInputFormat: Total input paths to process : 2
17/06/09 23:32:32 INFO mapreduce.JobSubmitter: number of splits:2
17/06/09 23:32:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1497025644387_0011
17/06/09 23:32:33 INFO impl.YarnClientImpl: Submitted application application_1497025644387_0011
17/06/09 23:32:33 INFO mapreduce.Job: The url to track the job: http://LinuxCentos7:8088/proxy/application_1497025644387_0011/
17/06/09 23:32:33 INFO mapreduce.Job: Running job: job_1497025644387_0011
17/06/09 23:32:52 INFO mapreduce.Job: Job job_1497025644387_0011 running in uber mode : false
17/06/09 23:32:52 INFO mapreduce.Job: map 0% reduce 0%
17/06/09 23:33:16 INFO mapreduce.Job: map 100% reduce 0%
17/06/09 23:33:16 INFO mapreduce.Job: Task Id : attempt_1497025644387_0011_m_000000_0, Status : FAILED
Error: java.lang.NullPointerException
at com.hadoop.WordCountEx3$TokenizerMapper.map(WordCountEx3.java:56)
at com.hadoop.WordCountEx3$TokenizerMapper.map(WordCountEx3.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
答案 0 :(得分:2)
使用word
实例初始化Text
变量:
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
private String fileText = null;
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
...
}