我想将mapreduce作业的输出存储在两个不同的目录中。 尽管我的代码旨在将相同的输出存储在不同的目录中。
我的驱动程序类代码
public class WordCountMain {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job myhadoopJob = new Job(conf);
myhadoopJob.setJarByClass(WordCountMain.class);
myhadoopJob.setJobName("WORD COUNT JOB");
FileInputFormat.addInputPath(myhadoopJob, new Path(args[0]));
myhadoopJob.setMapperClass(WordCountMapper.class);
myhadoopJob.setReducerClass(WordCountReducer.class);
myhadoopJob.setInputFormatClass(TextInputFormat.class);
myhadoopJob.setOutputFormatClass(TextOutputFormat.class);
myhadoopJob.setMapOutputKeyClass(Text.class);
myhadoopJob.setMapOutputValueClass(IntWritable.class);
myhadoopJob.setOutputKeyClass(Text.class);
myhadoopJob.setOutputValueClass(IntWritable.class);
MultipleOutputs.addNamedOutput(myhadoopJob, "output1", TextOutputFormat.class, Text.class, IntWritable.class);
MultipleOutputs.addNamedOutput(myhadoopJob, "output2", TextOutputFormat.class, Text.class, IntWritable.class);
FileOutputFormat.setOutputPath(myhadoopJob, new Path(args[1]));
System.exit(myhadoopJob.waitForCompletion(true) ? 0 : 1);
}
}
我的映射器代码
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>
{
@Override
protected void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException {
String line = value.toString();
String word =null;
StringTokenizer st = new StringTokenizer(line,",");
while(st.hasMoreTokens())
{
word= st.nextToken();
context.write(new Text(word), new IntWritable(1));
}
}
}
我的减速机代码低于
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable>
{
MultipleOutputs mout =null;
protected void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {
int count=0;
int num =0;
Iterator<IntWritable> ie =values.iterator();
while(ie.hasNext())
{
num = ie.next().get();//1
count= count+num;
}
mout.write("output1", key, new IntWritable(count));
mout.write("output2", key, new IntWritable(count));
@Override
protected void setup(org.apache.hadoop.mapreduce.Reducer.Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
super.setup(context);
mout = new MultipleOutputs<Text, IntWritable>(context);
}
}
@Override
protected void setup(org.apache.hadoop.mapreduce.Reducer.Context context)
throws IOException, InterruptedException {
super.setup(context);
mout = new MultipleOutputs<Text, IntWritable>(context);
}
}
我只是在reduce方法本身提供输出目录
但是当我使用下面的命令运行这个mapreduce作业时,它什么也没做。甚至Mapreduce都没有开始。只是一片空白并保持闲置。
hadoop jar WordCountMain.jar /user/cloudera/inputfiles/words.txt /user/cloudera/outputfiles/mapreduce/multipleoutputs
有人可以解释我出了什么问题吗?如何用我的代码来解决这个问题
实际上会发生两个具有不同名称的输出文件存储在/ user / cloudera / outputfiles / mapreduce / multipleoutputs中。
但我需要的是将输出文件存储在不同的目录中。
在猪中,我们可以通过给出不同的目录来使用两个STORE语句
如何在mapreduce中实现相同功能
答案 0 :(得分:0)
您可以尝试在Reducer的清理方法中关闭多个输出对象。