如何使用Map Reduce按最新日期进行记录?

时间:2015-09-24 13:32:00

标签: java hadoop mapreduce

我最近开始学习地图缩减编程。所以为了这些目的,我从一个场景开始。我在哪里有样本数据,如帐号,余额&交易日期。所以我希望通过帐号发生最新的交易。

这是我的意见:

+-------+-------+------------+
| accno | bal   | date       |
+-------+-------+------------+
| 13611 |  3360 | 2015-09-18 |
| 13611 |  1500 | 2015-09-19 |
| 13620 | 10000 | 2015-09-17 |
| 13620 |  6000 | 2015-09-18 |
| 13620 |  3000 | 2015-09-19 |
| 13631 |  5000 | 2015-09-16 |
| 13631 |  3500 | 2015-09-18 |
| 13621 |  3000 | 2015-09-10 |
| 13621 |  1800 | 2015-09-15 |
+-------+-------+------------+

预期输出 - >

    +-------+-------+------------+
    | accno | bal   | Date       |
    +-------+-------+------------+
    | 13611 |  1500 | 2015-09-19 |
    | 13620 |  3000 | 2015-09-19 |
    | 13631 |  3500 | 2015-09-18 |
    | 13621 |  1800 | 2015-09-15 |
    +-------+-------+------------+

我正在尝试开发代码,但我仍然坚持如何获取特定密钥的最新日期。作为过程

1)我读取输入并发出acc_no作为键和&作为值的文本行。

2)然后我在密钥上分区数据(即acc_no)

3)在减少阶段实施逻辑以获取最新日期的记录。

//驱动程序代码

public class EmployeeDriver extends Configured implements Tool{

    @Override
    public int run(String[] arg0) throws Exception {
        Configuration conf = getConf();
        Job job = Job.getInstance(conf);
        job.setJarByClass(getClass());

//      job.setInputFormatClass(KeyValueTextInputFormat.class);

        job.setMapperClass(EmployeeMapper.class);
        job.setMapOutputKeyClass(LongWritable.class);
        job.setMapOutputValueClass(Text.class);
        job.setPartitionerClass(EmployessPartitioner.class);
        job.setReducerClass(EmployeeReducer.class);

        job.setNumReduceTasks(4);

        FileInputFormat.addInputPath(job, new Path("D:\\datatoload\\HotelCloutMap\\Emplyoee\\Input\\in.txt"));
        FileOutputFormat.setOutputPath(job, new Path("D:\\datatoload\\HotelCloutMap\\Emplyoee\\output"));

        return job.waitForCompletion(true)? 0 : 1;
    }
    public static void main(String[] args) throws Exception {
        System.exit(ToolRunner.run(new EmployeeDriver(), args));
    }
}

// Mapper

public class EmployeeMapper extends Mapper<LongWritable, Text, LongWritable, Text>{
    LongWritable l = null;
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{
        String[] line = value.toString().split(",");
        l = new LongWritable(Long.parseLong(line[0]));
        System.out.println(key + " "+ value);
        context.write(l, value);
    }
}

// Reducer

public class EmployeeReducer extends Reducer<LongWritable, Text, LongWritable, Text>{

    public void reduce(LongWritable key, Iterable<Text> value, Context context) throws IOException, InterruptedException{
        int cnt =0;
        String date ="",sal = "";
        for(Text val : value){
             String [] str = val.toString().split(",");
             sal = str[1];
             date = str[2];
        }
        context.write(key, new Text(sal+" "+date));
    }
}

//分区程序

public class EmployessPartitioner extends Partitioner<LongWritable, Text>{
    @Override
    public int getPartition(LongWritable key, Text value, int numOfReduceTasks) {
        String[] line = value.toString().split(",");
        int acc_no = Integer.parseInt(line[0]);

         if(numOfReduceTasks == 0)
         {
            return 0;
         }
        if(acc_no == 13611)
            return 0;
        else if(acc_no == 13620)
            return 1;
        else if(acc_no == 13631)
            return 2;
        else
            return 3;
    }
}

我的程序给我这样的输出

13611   3360 2015-09-18
13620   10000 2015-09-17
13631   5000 2015-09-16
13621   3000 2015-09-10

所以我如何获得帐号的最新记录。 在此先感谢。

1 个答案:

答案 0 :(得分:0)

您可以使用secondary sorting来实现,您需要编写一个自定义grouping comparable,它会为您排序值。这恰好在数据传递到Reducer进行处理之前发生。辅助排序,对与键对应的值进行排序。此排序列表将传递给Reducer。

public class BalanceDate implements Writable {
   LongWritable balance;
   Text date;

   public void readFields(DataInput dataInput) throws IOException {
   }

   public void write(DataOutput dataOutput) throws IOException {
   }
}

编写一个实现WritableComparable<BalanceDate/Text>的自定义类比,并在作业配置中设置类。按降序对值的日期部分进行排序,只读取reducer中列表中的第一个值。

job.setGroupingComparatorClass(DateComparator.class);