我在本文末尾对我提出的两个问题进行了解释。
我正在尝试运行一个简单的Wordcount程序,所以我玩游戏,看看它是做什么的。
我目前有一个实现,似乎完美地运行到最后。然后在Main()中的最后一行之后(这只是println这样说)我得到的输出看起来像Hadoop作业的摘要,只有一个例外。
在我的Mapper和Reducer函数中,我还有一行只是简单地将任意文本输出到屏幕上,所以我知道它会触及该行,但是在运行时间我从未看到这些行中的任何一行被击中。我相信这会导致上面提到的IOException。
我有两个问题:
setMapperClass()
,
setCombinerClass()
和setReducerClass()
未被执行? 我已将运行作业的输出保存到文件中:
Enter the Code to run the particular program.
Wordcount = 000:
Assignment 1 = 001:
Assignment 2 = 002:
000
/usr/dan/wordcount/
/usr/dan/wordcount/result.txt
May 04, 2014 2:22:28 PM org.apache.hadoop.metrics.jvm.JvmMetrics init
INFO: Initializing JVM Metrics with processName=JobTracker, sessionId=
May 04, 2014 2:22:29 PM org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
INFO: Total input paths to process : 2
May 04, 2014 2:22:29 PM org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: Running job: job_local_0001
May 04, 2014 2:22:29 PM org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
INFO: Total input paths to process : 2
May 04, 2014 2:22:29 PM org.apache.hadoop.mapred.MapTask <init>
INFO: io.sort.mb = 100
May 04, 2014 2:22:29 PM org.apache.hadoop.mapred.MapTask <init>
INFO: data buffer = 79691776/99614720
May 04, 2014 2:22:29 PM org.apache.hadoop.mapred.MapTask <init>
INFO: record buffer = 262144/327680
May 04, 2014 2:22:29 PM org.apache.hadoop.mapred.LocalJobRunner run
WARNING: job_local_0001
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:845)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
May 04, 2014 2:22:30 PM org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: map 0% reduce 0%
May 04, 2014 2:22:30 PM org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: Job complete: job_local_0001
May 04, 2014 2:22:30 PM org.apache.hadoop.mapred.JobClient log
INFO: Counters: 0
Not Fail!
I hit the end of wordcount!
I hit the end of Main()
我设置应用程序的方式是基于用户输入的主类将流发送到相应的类。万一它会帮助我只发布我现在正在上课的课程。如果你需要看到更多只是问。
package hadoop;
import java.io.IOException;
import java.util.Arrays;
import java.util.StringTokenizer;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.GenericOptionsParser;
/**
*
* @author Dans Laptop
*/
public class Wordcount {
public static class TokenizerMapper
extends org.apache.hadoop.mapreduce.Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, org.apache.hadoop.mapreduce.Reducer.Context context
) throws IOException, InterruptedException {
System.out.println("mapper!");
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends org.apache.hadoop.mapreduce.Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
org.apache.hadoop.mapreduce.Reducer.Context context
) throws IOException, InterruptedException {
System.out.println("Reducer!");
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public void wordcount(String[] args) throws IOException, InterruptedException, ClassNotFoundException{
System.out.println(args[0]);// Prints arg 1
System.out.println(args[1]);// Prints arg 2
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "wordcount");
job.setJarByClass(Wordcount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
try{
job.waitForCompletion(true);
System.out.println("Not Fail!");
}catch(Exception e){
System.out.println(e.getLocalizedMessage());
System.out.println(e.getMessage());
System.out.println(Arrays.toString(e.getStackTrace()));
System.out.println(e.toString());
System.out.println("Failed!");
}
System.out.println("I hit the end of wordcount!");//Proves I hit the end of wordcount.
}
}
用于运行jar的命令是(来自/ usr / dan位置):
hadoop -jar ./hadoop.jar /usr/dan/wordcount/ /usr/dan/wordcount/result.txt
注意:我希望程序查看/ usr / dan / wordcount中的所有文件,然后创建一个文件/usr/dan/wordcount/result.txt,其中列出了每个单词及其出现的次数。我还没有得到这种行为,但我想弄清楚我有这两个问题,所以我可以在剩下的时间里排除故障。
对@Alexey的回应:
我没有意识到在hadoop中执行MapReduce作业期间无法直接打印到控制台。我刚刚假设这些线没有被执行。现在我知道在工作期间在哪里寻找任何输出。但是,根据您链接到的问题的说明没有显示任何工作供我查看。也许是因为我没有完成任何工作。
我已经从job.submit();
切换到job.waitForCompletion(true);
但仍然在获得输出后。我不知道这是否表明其他问题是错的,但我想知道它是不是。
我已经添加了你建议的行(这些只设置了Map类的输出?):
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
并删除/删除行(这些设置了Map和Reduce类的输出?):
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
我仍然得到相同的例外。从在线阅读它,似乎错误与Map类的输出类型不匹配Reduce类的输入类型。在我的代码中,这两个值匹配似乎非常明确。让我困惑的唯一一件事是它从哪里获得LongWritable?我的代码中没有任何地方。在查看这个问题Hadoop type mismatch in key from map expected value Text received value LongWritable之后,情况也是如此,但解决方案是指定我已经在做的Mapper和Reducer类。我在代码中注意到的另一件事是我的Mapper类的输入键是Object类型。这有什么意义吗?该错误表明它与密钥FROM Map不匹配。
我也继续更新我的代码/结果。
到目前为止,感谢您的帮助,我已根据您的回复提取了大量信息。
说明
答案 0 :(得分:2)
您的第一个问题与此问题类似How to print on console during MapReduce job execution in hadoop。
行job.submit();
告诉hadoop运行工作,但不要等到工作完成。我想你可能想用job.waitForCompletion(true);
替换这一行,所以在“我点到wordcount结束后”就没有输出了。
要摆脱异常,您应该指定映射器的输出键和值类:
job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class);