Question

我有两个文件文件让我说file1.txt其中我写了所有大写字母的单词，另一个文件名是file2.txt，其中我写了所有的小写字母，所以我怎么能这样做输入拆分为一个reducer中的file1.txt的所有大写字母和不同reducer中的file2.txt的所有小写字母。

任何人都可以帮帮我。

Answer 1

create custom partitionser.

分区器的主要目的是分区映射器输出中间键的键，值对，分区器将根据用户定义的条件划分数据，其工作方式类似于散列函数。分区总数等于总数工作中的减速器数量。（job.setNumReduceTasks（n））。分区阶段发生在映射阶段之后和mapreduce程序中的reduce阶段之前。默认分区功能是散列分区功能，其中对密钥进行散列。但是，根据键或值的某些其他功能对数据进行分区可能很有用。

  //Set number of reducer tasks in drive program
   job.setNumReduceTasks(2);

然后创建自定义分区器类，并在数据值的大小写的基础上添加用于对地图数据进行分区的逻辑。

public static class customPartitioner extends Partitioner<Text,Text>{
        public int getPartition(Text key, Text value, int numReduceTasks){
        if(StringUtils.isAllUpperCase(value))
            return 0;
                else
            return 1;
}

例如自定义分区程序 - ＆gt; http://www.hadooptpoint.org/hadoop-custom-partitioner-in-mapreduce-example/

如何将One Input数据文件放在一个Reducer中，将另一个输入文件数据放在另一个reducer中

1 个答案: