Question

我是Hadoop的初学者。尝试使用Generic Options Parser使用命令行设置Reducer的数量时，reducers的数量不会改变。对于reducers的数量，配置文件“mapred-site.xml”中没有设置属性，我认为这会使reducers的数量默认为1。我正在使用cloudera QuickVM和hadoop版本：“Hadoop 2.5.0-cdh5.2.0”。指针赞赏。另外我的问题是我想知道设置减速器数量的方法的首选顺序。

使用配置文件“mapred-site.xml”

mapred.reduce.tasks
通过在驱动程序类中指定

job.setNumReduceTasks（4）
通过使用工具界面在命令行指定：

-Dmapreduce.job.reduces = 2

Mapper：

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;


public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>
{   
    @Override
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
    {
        String line = value.toString();

        //Split the line into words
        for(String word: line.split("\\W+"))
        {
            //Make sure that the word is legitimate
            if(word.length() > 0)
            {
                //Emit the word as you see it
                context.write(new Text(word), new IntWritable(1));
            }
        }
    }
}

减速机：

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;


public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable>{

    @Override
    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
    {
        //Initializing the word count to 0 for every key
        int count=0;

        for(IntWritable value: values)
        {
            //Adding the word count counter to count
            count += value.get();
        }

        //Finally write the word and its count
        context.write(key, new IntWritable(count));
    }
}

驱动程序：

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;


public class WordCount extends Configured implements Tool 
{
    public int run(String[] args) throws Exception
    {
         //Instantiate the job object for configuring your job
        Job job = new Job();

        //Specify the class that hadoop needs to look in the JAR file
        //This Jar file is then sent to all the machines in the cluster
        job.setJarByClass(WordCount.class);

        //Set a meaningful name to the job
        job.setJobName("Word Count");

        //Add the apth from where the file input is to be taken
        FileInputFormat.addInputPath(job, new Path(args[0]));

        //Set the path where the output must be stored
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        //Set the Mapper and the Reducer class
        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordCountReducer.class);

        //Set the type of the key and value of Mapper and reducer
        /*
         * If the Mapper output type and Reducer output type are not the same then
         * also include setMapOutputKeyClass() and setMapOutputKeyValue()
         */
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        //job.setNumReduceTasks(4);

        //Start the job and wait for it to finish. And exit the program based on
        //the success of the program
        System.exit(job.waitForCompletion(true)?0:1);
        return 0;
    }

    public static void main(String[] args) throws Exception 
    {
        // Let ToolRunner handle generic command-line options 
        int res = ToolRunner.run(new Configuration(), new WordCount(), args);

        System.exit(res);
    }
}

我尝试了以下命令来运行该作业：

hadoop jar /home/cloudera/Misc/wordCount.jar WordCount -Dmapreduce.job.reduces = 2 hdfs：/ Input / inputdata hdfs：/ Output / wordcount_tool_D = 2_take13

和

hadoop jar /home/cloudera/Misc/wordCount.jar WordCount -D mapreduce.job.reduces = 2 hdfs：/ Input / inputdata hdfs：/ Output / wordcount_tool_D = 2_take14

Answer 1

在订单上回答您的查询。它总是2> 3> 1

您的驱动程序类中指定的选项优先于您指定为GenOptionsParser参数的选项或您在站点特定配置中指定的选项。

我建议您在提交作业之前通过打印来调试驱动程序类中的配置。这样，您可以在将作业提交到群集之前确定配置是什么。

Configuration conf = getConf(); // This is available to you since you extended Configured
for(Entry entry: conf)
   //Sysout the entries here

使用命令行设置Reduce任务的数量

1 个答案: