我在Netbeans中编写MapReduce作业并生成(也在NB中)一个jar文件。当我尝试在hadoop(版本1.2.1)中执行此作业时,我执行此命令:
$ hadoop jar job.jar org.job.mainClass /home/user/in.txt /home/user/outdir
此命令不会显示任何错误,但不会创建outdir,outfiles,......
这是我的工作代码:
映射
public class Mapper extends MapReduceBase implements org.apache.hadoop.mapred.Mapper<LongWritable, Text, Text, IntWritable> {
private final IntWritable one = new IntWritable(1);
private Text company = new Text("");
@Override
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
company.set(value.toString());
output.collect(value, one);
}
}
减速
public class Reducer extends MapReduceBase implements org.apache.hadoop.mapred.Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()){
sum++;
values.next();
}
output.collect(key, new IntWritable(sum));
}
}
主要
public static void main(String[] args) {
JobConf configuration = new JobConf(CdrMR.class);
configuration.setJobName("Dedupe companies");
configuration.setOutputKeyClass(Text.class);
configuration.setOutputValueClass(IntWritable.class);
configuration.setMapperClass(Mapper.class);
configuration.setReducerClass(Reducer.class);
configuration.setInputFormat(TextInputFormat.class);
configuration.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(configuration, new Path(args[0]));
FileOutputFormat.setOutputPath(configuration, new Path(args[1]));
}
输入文件的格式如下:
name1
name2
name3
...
另外说我在没有root权限的虚拟机(Ubuntu 12.04)中执行hadoop。 Hadoop是否正在执行作业并将outfile存储在不同的目录中?
答案 0 :(得分:1)
根据this article,您需要使用以下方法提交JobConf
:
JobClient.runJob(configuration);
答案 1 :(得分:0)
正确的hadoop命令是
$ hadoop jar job.jar /home/user/in.txt /home/user/outdir
不是
$ hadoop jar job.jar org.job.mainClass /home/user/in.txt /home/user/outdir
Hadoop认为org.job.mainClass是输入文件,in.txt是outputfile。执行结果是File Already Exist:in.txt。此代码适用于main方法:
public static void main(String[] args) throws FileNotFoundException, IOException {
JobConf configuration = new JobConf(CdrMR.class);
configuration.setJobName("Dedupe companies");
configuration.setOutputKeyClass(Text.class);
configuration.setOutputValueClass(IntWritable.class);
configuration.setMapperClass(NameMapper.class);
configuration.setReducerClass(NameReducer.class);
configuration.setInputFormat(TextInputFormat.class);
configuration.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(configuration, new Path(args[0]));
FileOutputFormat.setOutputPath(configuration, new Path(args[1]));
System.out.println("Hello Hadoop");
System.exit(JobClient.runJob(configuration).isSuccessful() ? 0 : 1);
}
感谢@AlexeyShestakov和@ Y.Prithvi
答案 2 :(得分:0)
正确的hadoop命令是
hadoop jar myjar packagename.DriverClass input output
案例1
MapReduceProject
|
|__ src
|
|__ package1
- Driver
- Mapper
- Reducer
然后你可以使用
hadoop jar myjar input output
案例2
MapReduceProject
|
|__ src
|
|__ package1
| - Driver1
| - Mapper1
| - Reducer1
|
|__ package2
- Driver2
- Mapper2
- Reducer2
对于这种情况,您必须指定驱动程序类以及hadoop命令。
hadoop jar myjar packagename.DriverClass input output