Question

这是一个相当普遍的问题，我不明白该选择什么。

我有字段： id，creationDate，state，dateDiff

id 是自然键。

我需要进入我的减速机：

KEY（id），VALUE（creationDate，state，dateDiff）

VALUE（creationDate，state，dateDiff）应按以下顺序排序：creationDate，state

我应该选择什么钥匙？我确实创建了复合键（id，creationDate，state）

我做了实施 id

的分区程序

id

id，creationDate，州

分拣机

我的减速机只有唯一的ID ... 例如：

1 123 true  6
1 456 false 6
1 789 true  7

我只得到

1 123 true  6

在我的减速机中。好像我没有得到分拣机，分区器，石斑鱼:(有一点理解。

这是我的代码：

public class POIMapper extends Mapper<LongWritable, Text, XVLRKey, XVLRValue>{

    private static final Log LOG = LogFactory.getLog(POIMapper.class);

    @Override
    public void map(LongWritable key, Text csvLine, Context context) throws IOException, InterruptedException {
        Pair<XVLRKey, XVLRValue> xvlrPair = POIUtil.parseKeyAndValue(csvLine.toString(), POIUtil.CSV_DELIMITER);
        context.write(xvlrPair.getValue0(), xvlrPair.getValue1());
    }

}

public class POIReducer extends Reducer<XVLRKey, XVLRValue, LongWritable, Text>{

    private static final Log LOG = LogFactory.getLog(POIReducer.class);

    private final Text textForOutput = new Text();

    @Override()
    public void reduce(XVLRKey key, Iterable<XVLRValue> values, Context context)
                                                                            throws IOException, InterruptedException {
        XVLROutput out = null;
//Just check that values are correctly attached to keys. No logic here...
        LOG.info("\nPOIReducer: key:"+key);
        for(XVLRValue value : values){
            LOG.info("\n --- --- --- value:"+value+"\n");
            textForOutput.set(print(key, value));
            context.write(key.getMsisdn(), textForOutput);
        }
    }

    private String print(XVLRKey key, XVLRValue value){
        StringBuilder builder = new StringBuilder();
        builder.append(value.getLac())          .append("\t")
               .append(value.getCellId())       .append("\t")
               .append(key.getDateOccurrence()) .append("\t")
               .append(value.getTimeDelta());
        return builder.toString();
    }
}

工作代码：

JobBuilder<POIJob> jobBuilder = createTestableJobInstance();

        jobBuilder.withOutputKey(XVLRKey.class);
        jobBuilder.withOutputValue(XVLRValue.class);

        jobBuilder.withMapper(POIMapper.class);
        jobBuilder.withReducer(POIReducer.class);

        jobBuilder.withInputFormat(TextInputFormat.class);
        jobBuilder.withOutputFormat(TextOutputFormat.class);

        jobBuilder.withPartitioner(XVLRKeyPartitioner.class);
        jobBuilder.withSortComparator(XVLRCompositeKeyComparator.class);
        jobBuilder.withGroupingComparator(XVLRKeyGroupingComparator.class);

        boolean result = buildSubmitAndWaitForCompletion(jobBuilder);
        MatcherAssert.assertThat(result, Matchers.is(true));




public class XVLRKeyPartitioner extends Partitioner<XVLRKey, XVLRValue> {

    @Override
    public int getPartition(XVLRKey key, XVLRValue value, int numPartitions) {
            return Math.abs(key.getMsisdn().hashCode() * 127) % numPartitions;
    }
}

public class XVLRCompositeKeyComparator extends WritableComparator {

    protected XVLRCompositeKeyComparator() {
        super(XVLRKey.class, true);
    }

    @Override
    public int compare(WritableComparable writable1, WritableComparable writable2) {
        XVLRKey key1 = (XVLRKey) writable1;
        XVLRKey key2 = (XVLRKey) writable2;
       return key1.compareTo(key2);
    }
}

public class XVLRKeyGroupingComparator extends WritableComparator {

    protected XVLRKeyGroupingComparator() {
        super(XVLRKey.class, true);
    }

    @Override
    public int compare(WritableComparable writable1, WritableComparable writable2) {

        XVLRKey key1 = (XVLRKey) writable1;
        XVLRKey key2 = (XVLRKey) writable2;

        return key1.getMsisdn().compareTo(key2.getMsisdn());

    }
}

public class XVLRKey implements WritableComparable<XVLRKey>{

    private  final LongWritable msisdn;
    private  final LongWritable dateOccurrence;
    private  final BooleanWritable state;
//getters-setters
}

public class XVLRValue implements WritableComparable<XVLRValue> {

    private final LongWritable lac;
    private final LongWritable cellId;
    private final LongWritable timeDelta;
    private final LongWritable dateOccurrence;
    private final BooleanWritable state;
//getters-setterrs
}

请注意XVLRKey，XVLRValue确实有重复的字段。我在XVLRKey中重复了dateOccurrence，因为我想在reducer中获取排序值。它们应按dateOccurrence排序。

我找不到如何在不重复的情况下解决这个问题的方法。

Answer 1

在二级排序情况下（如您所描述的），当您从迭代器中检索下一个值时，您所拥有的键的值会发生变化。

这是因为Hadoop框架重用了对象的实例，以尽可能避免对象创建和垃圾收集。

因此，当您调用“next（）”时，框架也会更改密钥实例中的数据。

所以如果你移动

    LOG.info("\nPOIReducer: key:"+key);

语句，以便它在for循环中，您应该看到所有键都来了。

由于这种影响，我基本上使用以下“规则”来完成工作：

该密钥仅供框架用于将值指向对减速机。

这意味着

我可能需要的一切都必须存在于价值中。
在reducer中我只看值，我总是丢弃/忽略键。
也可以在值中找到用于创建密钥的属性。

Hadoop对值的二级排序。排序，吞噬松散的价值观

1 个答案: