这是背景。我的MapReduce作业有以下输入(例子):
Apache Hadoop
Apache Lucene
StackOverflow
....
(实际上每行代表一个用户查询。这里不重要。)我希望我的RecordReader
类读取一行,然后将几个键值对传递给映射器。例如,如果RecordReader
获得Apache Hadoop
,那么我希望它生成以下键值对并将其传递给映射器:
Apache Hadoop - 1
Apache Hadoop - 2
Apache Hadoop - 3
(“ - ”是此处的分隔符。)我发现RecordReader
在next()
方法中传递了键值:
next(key, value);
每次调用RecordReader.next()时,只会传递一个键和一个值作为参数。那我该怎么办呢?
答案 0 :(得分:2)
我相信你可以简单地使用它:
public static class MultiMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
for (int i = 1; i <= n; i++) {
context.write(value, new IntWritable(i));
}
}
}
这里n是您要传递的值的数量。例如,您指定的键值对:
Apache Hadoop - 1
Apache Hadoop - 2
Apache Hadoop - 3
n将是3。
答案 1 :(得分:1)
我认为如果你想发送给映射器使用相同的密钥;你必须实现你的所有者RecordReader;例如,您可以使用MutliRecordReader扩展LineRecordReade;在这里你必须改变nextKeyValue方法; 这是LineRecordReade的原始代码:
public boolean nextKeyValue() throws IOException {
if (key == null) {
key = new LongWritable();
}
key.set(pos);
if (value == null) {
value = new Text();
}
int newSize = 0;
// We always read one extra line, which lies outside the upper
// split limit i.e. (end - 1)
while (getFilePosition() <= end) {
newSize = in.readLine(value, maxLineLength,
Math.max(maxBytesToConsume(pos), maxLineLength));
pos += newSize;
if (newSize < maxLineLength) {
break;
}
// line too long. try again
LOG.info("Skipped line of size " + newSize + " at pos " +
(pos - newSize));
}
if (newSize == 0) {
key = null;
value = null;
return false;
} else {
return true;
}
}
你可以像这样改变它:
public boolean nextKeyValue() throws IOException {
if (key == null) {
key = new Text();
}
key.set(pos);
if (value == null) {
value = new Text();
}
int newSize = 0;
while (getFilePosition() <= end && n<=3) {
newSize = in.readLine(key, maxLineLength,
Math.max(maxBytesToConsume(pos), maxLineLength));//change value --> key
value =Text(n);
n++;
if(n ==3 )// we don't go to next until the N is three;
pos += newSize;
if (newSize < maxLineLength) {
break;
}
// line too long. try again
LOG.info("Skipped line of size " + newSize + " at pos " +
(pos - newSize));
}
if (newSize == 0) {
key = null;
value = null;
return false;
} else {
return true;
}
}
我认为这适合你
答案 2 :(得分:0)
尽量不给钥匙: -
context.write(NullWritable.get(), new Text("Apache Hadoop - 1"));
context.write(NullWritable.get(), new Text("Apache Hadoop - 2"));
context.write(NullWritable.get(), new Text("Apache Hadoop - 3"));