使用命令行设置Reduce任务的数量

时间:2014-11-29 23:20:42

标签: hadoop

我是Hadoop的初学者。尝试使用Generic Options Parser使用命令行设置Reducer的数量时,reducers的数量不会改变。对于reducers的数量,配置文件“mapred-site.xml”中没有设置属性,我认为这会使reducers的数量默认为1。我正在使用cloudera QuickVM和hadoop版本:“Hadoop 2.5.0-cdh5.2.0”。 指针赞赏。另外我的问题是我想知道设置减速器数量的方法的首选顺序。

  1. 使用配置文件“mapred-site.xml”

    mapred.reduce.tasks

  2. 通过在驱动程序类中指定

    job.setNumReduceTasks(4)

  3. 通过使用工具界面在命令行指定:

    -Dmapreduce.job.reduces = 2

  4. Mapper:

    import java.io.IOException;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;
    
    
    public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>
    {   
        @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
        {
            String line = value.toString();
    
            //Split the line into words
            for(String word: line.split("\\W+"))
            {
                //Make sure that the word is legitimate
                if(word.length() > 0)
                {
                    //Emit the word as you see it
                    context.write(new Text(word), new IntWritable(1));
                }
            }
        }
    }
    

    减速机:

    import java.io.IOException;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Reducer;
    
    
    public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
    
        @Override
        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
        {
            //Initializing the word count to 0 for every key
            int count=0;
    
            for(IntWritable value: values)
            {
                //Adding the word count counter to count
                count += value.get();
            }
    
            //Finally write the word and its count
            context.write(key, new IntWritable(count));
        }
    }
    

    驱动程序:

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;
    
    
    public class WordCount extends Configured implements Tool 
    {
        public int run(String[] args) throws Exception
        {
             //Instantiate the job object for configuring your job
            Job job = new Job();
    
            //Specify the class that hadoop needs to look in the JAR file
            //This Jar file is then sent to all the machines in the cluster
            job.setJarByClass(WordCount.class);
    
            //Set a meaningful name to the job
            job.setJobName("Word Count");
    
            //Add the apth from where the file input is to be taken
            FileInputFormat.addInputPath(job, new Path(args[0]));
    
            //Set the path where the output must be stored
            FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
            //Set the Mapper and the Reducer class
            job.setMapperClass(WordCountMapper.class);
            job.setReducerClass(WordCountReducer.class);
    
            //Set the type of the key and value of Mapper and reducer
            /*
             * If the Mapper output type and Reducer output type are not the same then
             * also include setMapOutputKeyClass() and setMapOutputKeyValue()
             */
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
    
            //job.setNumReduceTasks(4);
    
            //Start the job and wait for it to finish. And exit the program based on
            //the success of the program
            System.exit(job.waitForCompletion(true)?0:1);
            return 0;
        }
    
        public static void main(String[] args) throws Exception 
        {
            // Let ToolRunner handle generic command-line options 
            int res = ToolRunner.run(new Configuration(), new WordCount(), args);
    
            System.exit(res);
        }
    }
    

    我尝试了以下命令来运行该作业:

    hadoop jar /home/cloudera/Misc/wordCount.jar WordCount -Dmapreduce.job.reduces = 2 hdfs:/ Input / inputdata hdfs:/ Output / wordcount_tool_D = 2_take13

    hadoop jar /home/cloudera/Misc/wordCount.jar WordCount -D mapreduce.job.reduces = 2 hdfs:/ Input / inputdata hdfs:/ Output / wordcount_tool_D = 2_take14

1 个答案:

答案 0 :(得分:0)

在订单上回答您的查询。它总是2> 3> 1

您的驱动程序类中指定的选项优先于您指定为GenOptionsParser参数的选项或您在站点特定配置中指定的选项。

我建议您在提交作业之前通过打印来调试驱动程序类中的配置。这样,您可以在将作业提交到群集之前确定配置是什么。

Configuration conf = getConf(); // This is available to you since you extended Configured
for(Entry entry: conf)
   //Sysout the entries here