Question

我想将mapreduce作业的输出存储在两个不同的目录中。尽管我的代码旨在将相同的输出存储在不同的目录中。

我的驱动程序类代码

public class WordCountMain {


public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

Job myhadoopJob = new Job(conf);

myhadoopJob.setJarByClass(WordCountMain.class);
myhadoopJob.setJobName("WORD COUNT JOB");
FileInputFormat.addInputPath(myhadoopJob, new Path(args[0]));

myhadoopJob.setMapperClass(WordCountMapper.class);
myhadoopJob.setReducerClass(WordCountReducer.class);    
myhadoopJob.setInputFormatClass(TextInputFormat.class);
myhadoopJob.setOutputFormatClass(TextOutputFormat.class);

myhadoopJob.setMapOutputKeyClass(Text.class);
myhadoopJob.setMapOutputValueClass(IntWritable.class);

myhadoopJob.setOutputKeyClass(Text.class);
myhadoopJob.setOutputValueClass(IntWritable.class);

MultipleOutputs.addNamedOutput(myhadoopJob, "output1", TextOutputFormat.class, Text.class, IntWritable.class);
MultipleOutputs.addNamedOutput(myhadoopJob, "output2", TextOutputFormat.class, Text.class, IntWritable.class);
FileOutputFormat.setOutputPath(myhadoopJob, new Path(args[1]));




System.exit(myhadoopJob.waitForCompletion(true) ? 0 : 1);



}

}

我的映射器代码

   public class WordCountMapper extends Mapper<LongWritable, Text, Text,     IntWritable>

{





@Override
protected void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException {

String line = value.toString();
String word =null;

StringTokenizer st = new StringTokenizer(line,",");


while(st.hasMoreTokens())
{
 word=  st.nextToken();



context.write(new Text(word), new IntWritable(1));




}



}

}

我的减速机代码低于

public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable>

{

MultipleOutputs mout =null;

protected void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {


int count=0;
int num =0;



    Iterator<IntWritable> ie =values.iterator();

    while(ie.hasNext())
    {
         num = ie.next().get();//1
         count= count+num;

    }
mout.write("output1", key, new IntWritable(count));

mout.write("output2", key, new IntWritable(count));

@Override
protected void setup(org.apache.hadoop.mapreduce.Reducer.Context context)
        throws IOException, InterruptedException {
    // TODO Auto-generated method stub
    super.setup(context);

     mout = new MultipleOutputs<Text, IntWritable>(context);
}




}

@Override
protected void setup(org.apache.hadoop.mapreduce.Reducer.Context context)
        throws IOException, InterruptedException {

    super.setup(context);

     mout = new MultipleOutputs<Text, IntWritable>(context);
}

}

我只是在reduce方法本身提供输出目录

但是当我使用下面的命令运行这个mapreduce作业时，它什么也没做。甚至Mapreduce都没有开始。只是一片空白并保持闲置。

hadoop jar WordCountMain.jar /user/cloudera/inputfiles/words.txt /user/cloudera/outputfiles/mapreduce/multipleoutputs

有人可以解释我出了什么问题吗？如何用我的代码来解决这个问题

实际上会发生两个具有不同名称的输出文件存储在/ user / cloudera / outputfiles / mapreduce / multipleoutputs中。

但我需要的是将输出文件存储在不同的目录中。

在猪中，我们可以通过给出不同的目录来使用两个STORE语句

如何在mapreduce中实现相同功能

Answer 1

您可以尝试在Reducer的清理方法中关闭多个输出对象。

Mapreduce MultipleOutputs错误

1 个答案: