这是一个相当普遍的问题,我不明白该选择什么。
我有字段: id,creationDate,state,dateDiff
id 是自然键。
我需要进入我的减速机:
KEY(id),VALUE(creationDate,state,dateDiff)
VALUE(creationDate,state,dateDiff)应按以下顺序排序:creationDate,state
我应该选择什么钥匙? 我确实创建了复合键(id,creationDate,state)
我做了实施 id
的分区程序id
id,creationDate,州分拣机
我的减速机只有唯一的ID ... 例如:
1 123 true 6
1 456 false 6
1 789 true 7
我只得到
1 123 true 6
在我的减速机中。好像我没有得到分拣机,分区器,石斑鱼:(有一点理解。
这是我的代码:
public class POIMapper extends Mapper<LongWritable, Text, XVLRKey, XVLRValue>{
private static final Log LOG = LogFactory.getLog(POIMapper.class);
@Override
public void map(LongWritable key, Text csvLine, Context context) throws IOException, InterruptedException {
Pair<XVLRKey, XVLRValue> xvlrPair = POIUtil.parseKeyAndValue(csvLine.toString(), POIUtil.CSV_DELIMITER);
context.write(xvlrPair.getValue0(), xvlrPair.getValue1());
}
}
public class POIReducer extends Reducer<XVLRKey, XVLRValue, LongWritable, Text>{
private static final Log LOG = LogFactory.getLog(POIReducer.class);
private final Text textForOutput = new Text();
@Override()
public void reduce(XVLRKey key, Iterable<XVLRValue> values, Context context)
throws IOException, InterruptedException {
XVLROutput out = null;
//Just check that values are correctly attached to keys. No logic here...
LOG.info("\nPOIReducer: key:"+key);
for(XVLRValue value : values){
LOG.info("\n --- --- --- value:"+value+"\n");
textForOutput.set(print(key, value));
context.write(key.getMsisdn(), textForOutput);
}
}
private String print(XVLRKey key, XVLRValue value){
StringBuilder builder = new StringBuilder();
builder.append(value.getLac()) .append("\t")
.append(value.getCellId()) .append("\t")
.append(key.getDateOccurrence()) .append("\t")
.append(value.getTimeDelta());
return builder.toString();
}
}
工作代码:
JobBuilder<POIJob> jobBuilder = createTestableJobInstance();
jobBuilder.withOutputKey(XVLRKey.class);
jobBuilder.withOutputValue(XVLRValue.class);
jobBuilder.withMapper(POIMapper.class);
jobBuilder.withReducer(POIReducer.class);
jobBuilder.withInputFormat(TextInputFormat.class);
jobBuilder.withOutputFormat(TextOutputFormat.class);
jobBuilder.withPartitioner(XVLRKeyPartitioner.class);
jobBuilder.withSortComparator(XVLRCompositeKeyComparator.class);
jobBuilder.withGroupingComparator(XVLRKeyGroupingComparator.class);
boolean result = buildSubmitAndWaitForCompletion(jobBuilder);
MatcherAssert.assertThat(result, Matchers.is(true));
public class XVLRKeyPartitioner extends Partitioner<XVLRKey, XVLRValue> {
@Override
public int getPartition(XVLRKey key, XVLRValue value, int numPartitions) {
return Math.abs(key.getMsisdn().hashCode() * 127) % numPartitions;
}
}
public class XVLRCompositeKeyComparator extends WritableComparator {
protected XVLRCompositeKeyComparator() {
super(XVLRKey.class, true);
}
@Override
public int compare(WritableComparable writable1, WritableComparable writable2) {
XVLRKey key1 = (XVLRKey) writable1;
XVLRKey key2 = (XVLRKey) writable2;
return key1.compareTo(key2);
}
}
public class XVLRKeyGroupingComparator extends WritableComparator {
protected XVLRKeyGroupingComparator() {
super(XVLRKey.class, true);
}
@Override
public int compare(WritableComparable writable1, WritableComparable writable2) {
XVLRKey key1 = (XVLRKey) writable1;
XVLRKey key2 = (XVLRKey) writable2;
return key1.getMsisdn().compareTo(key2.getMsisdn());
}
}
public class XVLRKey implements WritableComparable<XVLRKey>{
private final LongWritable msisdn;
private final LongWritable dateOccurrence;
private final BooleanWritable state;
//getters-setters
}
public class XVLRValue implements WritableComparable<XVLRValue> {
private final LongWritable lac;
private final LongWritable cellId;
private final LongWritable timeDelta;
private final LongWritable dateOccurrence;
private final BooleanWritable state;
//getters-setterrs
}
请注意XVLRKey,XVLRValue确实有重复的字段。我在XVLRKey中重复了dateOccurrence,因为我想在reducer中获取排序值。它们应按dateOccurrence排序。
我找不到如何在不重复的情况下解决这个问题的方法。
答案 0 :(得分:0)
在二级排序情况下(如您所描述的),当您从迭代器中检索下一个值时,您所拥有的键的值会发生变化。
这是因为Hadoop框架重用了对象的实例,以尽可能避免对象创建和垃圾收集。
因此,当您调用“next()”时,框架也会更改密钥实例中的数据。
所以如果你移动
LOG.info("\nPOIReducer: key:"+key);
语句,以便它在for循环中,您应该看到所有键都来了。
由于这种影响,我基本上使用以下“规则”来完成工作:
该密钥仅供框架用于将值指向 对减速机。
这意味着