我是一个hadoop新手。我在本教程中运行代码时遇到了问题:
地图缩减过程将停止步骤:
[main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[main] INFO org.apache.hadoop.conf.Configuration.deprecation - session.id is deprecated. Instead, use dfs.metrics.session-id
[main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
[main] WARN org.apache.hadoop.mapreduce.JobResourceUploader - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
[main] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
[main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 4
[main] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:4
[main] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local61587531_0001
[main] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/
[Thread-19] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null
[Thread-19] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
map-reduce应用程序的代码是
public class VoteCountApplication extends Configured implements Tool {
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new VoteCountApplication(), args);
System.exit(res);
}
@Override
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("usage: [input] [output]");
System.exit(-1);
}
Job job = Job.getInstance(new Configuration());
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(VoteCountMapper.class);
job.setReducerClass(VoteCountReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setJarByClass(VoteCountApplication.class);
job.submit();
return 0;
}
}
但是,如果我使用WordCount示例中的main方法来运行此项目
public class VoteCountApplication extends Configured implements Tool {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "vote count");
job.setJarByClass(VoteCountApplication.class);
job.setMapperClass(VoteCountMapper.class);
job.setCombinerClass(VoteCountReducer.class);
job.setReducerClass(VoteCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
这是完美的!我不知道教程代码中的问题是什么。有没有人能理解代码之间的区别?感谢
这是Map和Reduce代码:
public class VoteCountMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
@Override
public void map(Object key, Text value, Context output) throws IOException,
InterruptedException {
//If more than one word is present, split using white space.
String[] words = value.toString().split(" ");
//Only the first word is the candidate name
output.write(new Text(words[0]), one);
}
}
public class VoteCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context output)
throws IOException, InterruptedException {
int voteCount = 0;
for(IntWritable value: values){
voteCount+= value.get();
}
output.write(key, new IntWritable(voteCount));
}
}