我找不到提交不使用已弃用的JobConf
类的Hadoop作业的单个示例。尚未弃用的JobClient
仍然只支持采用JobConf
参数的方法。
有人可以指出我只使用Configuration
类(不是JobConf
)提交Hadoop地图/减少作业的Java代码示例,并使用mapreduce.lib.input
包代替mapred.input
?
答案 0 :(得分:23)
希望这有用
import java.io.File;
import org.apache.commons.io.FileUtils;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class MapReduceExample extends Configured implements Tool {
static class MyMapper extends Mapper<LongWritable, Text, LongWritable, Text> {
public MyMapper(){
}
protected void map(
LongWritable key,
Text value,
org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, LongWritable, Text>.Context context)
throws java.io.IOException, InterruptedException {
context.getCounter("mygroup", "jeff").increment(1);
context.write(key, value);
};
}
@Override
public int run(String[] args) throws Exception {
Job job = new Job();
job.setMapperClass(MyMapper.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
return 0;
}
public static void main(String[] args) throws Exception {
FileUtils.deleteDirectory(new File("data/output"));
args = new String[] { "data/input", "data/output" };
ToolRunner.run(new MapReduceExample(), args);
}
}
答案 1 :(得分:9)
我相信this tutorial说明了使用Hadoop 0.20.1删除已弃用的JobConf类。
答案 2 :(得分:2)
这是可下载代码的一个很好的例子:http://sonerbalkir.blogspot.com/2010/01/new-hadoop-api-020x.html 它还有两年多了,没有官方文档讨论新的API。悲伤。
答案 3 :(得分:1)
在之前的API中,有三种方式提交作业,其中一种方法是提交作业并获取对RunningJob的引用并获取RunningJob的id。
submitJob(JobConf) : only submits the job, then poll the returned handle to the RunningJob to query status and make scheduling decisions.
如何使用新的Api并获取对RunningJob的引用并获取runningJob的id,因为没有api返回对RunningJob的引用
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html
感谢
答案 4 :(得分:1)
尝试使用Configuration
和Job
。这是一个例子:
(替换您的Mapper
,Combiner
,Reducer
课程及其他配置
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
if(args.length != 2) {
System.err.println("Usage: <in> <out>");
System.exit(2);
}
Job job = Job.getInstance(conf, "Word Count");
// set jar
job.setJarByClass(WordCount.class);
// set Mapper, Combiner, Reducer
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
/* Optional, set customer defined Partioner:
* job.setPartitionerClass(MyPartioner.class);
*/
// set output key
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// set input and output path
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// by default, Hadoop use TextInputFormat and TextOutputFormat
// any customer defined input and output class must implement InputFormat/OutputFormat interface
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}