我是MapReduce应用程序的新手。我只是想找到单词'我的数据集上的长度,根据它们的长度将它们分类为tiny,little,med,huge,最后,我想看看Java中我的数据集中有多少单词是微小的,小的,med的还是巨大的总数。我在实现reducer方面遇到了问题。当我在Hadoop集群上执行jar文件时,它不会返回任何结果。如果有人向我伸出援助之手,我将不胜感激。这是我尝试执行的reducer代码,但我想有很多错误。
public class WordSizeReducer extends Reducer<IntWritable, IntWritable, Text, IntWritable> {
private IntVariable result = new IntVariable();
IntWritable tin, smal, mediu,bi;
int t, s, m, b;
int count;
Text tiny, small, medium, big;
public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{
for (IntWritable val:values){
if(val.get() == 1){
tin.set(t);
t++;
}
else if(2<=val.get() && val.get()<=4){
smal.set(s);
s++;
}
else if(5<=val.get() && val.get()<=9){
mediu.set(m);
m++;
}
else if(10<=val.get()){
bi.set(b);
b++; }
}
context.write(tiny, tin);
context.write(small, smal);
context.write(medium, mediu);
context.write(big, bi);
}
}
public class WordSizeMapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private IntWritable wordLength = new IntWritable();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
wordLength.set(tokenizer.nextToken().length());
context.write(wordLength, one);
}
}
}
答案 0 :(得分:0)
tiny
,small
,medium
和big
从未初始化,因此它们将为空。
这意味着您的所有context.write()
调用都使用空键。
显然,这并不好,因为您无法区分不同字数的计数。
更糟糕的是,tin
,smal
,mediu
,bi
永远不会被初始化,当你试图调用{{1}时会导致NullPointerException
他们(你正确初始化set()
,但从不使用它)。
(另外,您不需要在循环中重复设置result
值;只需更新IntWritables
,然后在结束之前设置t,s,m,b
一次IntWritable
来电)
现在更新已添加的映射器代码:
对于输入中的每个单词,您正在编写键值对(长度,1)。
reducer将使用相同的键收集所有值,因此将调用它,例如:
context.write()
所以你的减速器只能看到它被错误地视为字长的值&#39; 1&#39;实际上,关键是单词长度。
立即更新已添加的堆栈跟踪:
错误消息解释了什么错误 - Hadoop无法找到您的工作类,因此它们根本没有被执行。错误说:
(2, [1,1,1,1,1,1,1,1,])
(3, [1,1,1])
但是你的课程被称为java.lang.ClassNotFoundException: WordSize.WordsizeMapper
(或者如果你有外部课程,可能是WordSizeMapper
) - 请注意&#34;尺寸&#34; /&#34;尺寸&#34的不同大小写;!您需要检查如何调用Hadoop。
答案 1 :(得分:0)
没办法,我也检查了我的代码,我做了一些修复,但结果是一样的,在hadoop终端窗口,我无法得到任何结果。代码的最后一个版本如下:
public class WordSizeTest {
public static void main(String[] args) throws Exception{
if(args.length != 2)
{
System.err.println("Usage: Word Size <in> <out>");
System.exit(2);
}
Job job = new Job();
job.setJarByClass(WordSizeTest.class);
job.setJobName("Word Size");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(WordSizeMapper.class);
job.setReducerClass(WordSizeReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
public class WordSizeMapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
final static IntWritable one = new IntWritable(1);
IntWritable wordLength = new IntWritable();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
wordLength.set(tokenizer.nextToken().length());
context.write(wordLength, one);
}
}
}
public class WordSizeReducer extends Reducer<IntWritable, IntWritable, Text, IntWritable>{
IntWritable tin = new IntWritable();
IntWritable smal = new IntWritable();
IntWritable mediu = new IntWritable();
IntWritable bi = new IntWritable();
int t, s, m, b;
Text tiny = new Text("tiny");
Text small = new Text("small");
Text medium = new Text("medium");
Text big = new Text("big");
public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{
for (IntWritable val:values){
if(key.get() == 1){
t += val.get();
}
else if(2<=key.get() && key.get()<=4){
s += val.get();
}
else if(5<=key.get() && key.get()<=9){
m += val.get();
}
else if(10<=key.get()){
b += val.get();
}
}
tin.set(t);
smal.set(s);
mediu.set(m);
bi.set(b);
context.write(tiny, tin);
context.write(small, smal);
context.write(medium, mediu);
context.write(big, bi);
}
}
终端上的错误就是这样,
15/02/01 12:09:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/02/01 12:09:25 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
15/02/01 12:09:25 INFO input.FileInputFormat: Total input paths to process : 925
15/02/01 12:09:25 WARN snappy.LoadSnappy: Snappy native library is available
15/02/01 12:09:25 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/02/01 12:09:25 INFO snappy.LoadSnappy: Snappy native library loaded
15/02/01 12:09:29 INFO mapred.JobClient: Running job: job_201501191143_0177
15/02/01 12:09:30 INFO mapred.JobClient: map 0% reduce 0%
15/02/01 12:09:47 INFO mapred.JobClient: Task Id : attempt_201501191143_0177_m_000001_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: WordSize.WordSizeMapper
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:859)
at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:718)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(AccessController.java:310)
at javax.security.auth.Subject.doAs(Subject.java:573)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.ClassNotFoundException: WordSize.WordsizeMapper
at java.lang.Class.forName(Class.java:174)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:812)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857)
... 8 more
15/02/01 12:09:49 INFO mapred.JobClient: Task Id : attempt_201501191143_0177_m_000000_0, Status : FAILED