我是Hadoop的初学者。尝试使用Generic Options Parser使用命令行设置Reducer的数量时,reducers的数量不会改变。对于reducers的数量,配置文件“mapred-site.xml”中没有设置属性,我认为这会使reducers的数量默认为1。我正在使用cloudera QuickVM和hadoop版本:“Hadoop 2.5.0-cdh5.2.0”。 指针赞赏。另外我的问题是我想知道设置减速器数量的方法的首选顺序。
使用配置文件“mapred-site.xml”
mapred.reduce.tasks
通过在驱动程序类中指定
job.setNumReduceTasks(4)
通过使用工具界面在命令行指定:
-Dmapreduce.job.reduces = 2
Mapper:
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>
{
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String line = value.toString();
//Split the line into words
for(String word: line.split("\\W+"))
{
//Make sure that the word is legitimate
if(word.length() > 0)
{
//Emit the word as you see it
context.write(new Text(word), new IntWritable(1));
}
}
}
}
减速机:
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
{
//Initializing the word count to 0 for every key
int count=0;
for(IntWritable value: values)
{
//Adding the word count counter to count
count += value.get();
}
//Finally write the word and its count
context.write(key, new IntWritable(count));
}
}
驱动程序:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class WordCount extends Configured implements Tool
{
public int run(String[] args) throws Exception
{
//Instantiate the job object for configuring your job
Job job = new Job();
//Specify the class that hadoop needs to look in the JAR file
//This Jar file is then sent to all the machines in the cluster
job.setJarByClass(WordCount.class);
//Set a meaningful name to the job
job.setJobName("Word Count");
//Add the apth from where the file input is to be taken
FileInputFormat.addInputPath(job, new Path(args[0]));
//Set the path where the output must be stored
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//Set the Mapper and the Reducer class
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
//Set the type of the key and value of Mapper and reducer
/*
* If the Mapper output type and Reducer output type are not the same then
* also include setMapOutputKeyClass() and setMapOutputKeyValue()
*/
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//job.setNumReduceTasks(4);
//Start the job and wait for it to finish. And exit the program based on
//the success of the program
System.exit(job.waitForCompletion(true)?0:1);
return 0;
}
public static void main(String[] args) throws Exception
{
// Let ToolRunner handle generic command-line options
int res = ToolRunner.run(new Configuration(), new WordCount(), args);
System.exit(res);
}
}
我尝试了以下命令来运行该作业:
hadoop jar /home/cloudera/Misc/wordCount.jar WordCount -Dmapreduce.job.reduces = 2 hdfs:/ Input / inputdata hdfs:/ Output / wordcount_tool_D = 2_take13
和
hadoop jar /home/cloudera/Misc/wordCount.jar WordCount -D mapreduce.job.reduces = 2 hdfs:/ Input / inputdata hdfs:/ Output / wordcount_tool_D = 2_take14
答案 0 :(得分:0)
在订单上回答您的查询。它总是2> 3> 1
您的驱动程序类中指定的选项优先于您指定为GenOptionsParser参数的选项或您在站点特定配置中指定的选项。
我建议您在提交作业之前通过打印来调试驱动程序类中的配置。这样,您可以在将作业提交到群集之前确定配置是什么。
Configuration conf = getConf(); // This is available to you since you extended Configured
for(Entry entry: conf)
//Sysout the entries here