Question

我希望能够为我的键/值对设置不同的分隔符，我接收到我的MR作业的地图功能中。

例如我的文本文件可能包含：

John-23
Mary-45
Scott-13

在我的map函数中，我希望键为John，每个元素的值为23等。

然后，如果我使用

设置输出分隔符

conf.set("mapreduce.textoutputformat.separator", "-");

减速机是否会在第一个' - '之前拿起钥匙，之后的所有值都是？或者我是否也需要对减速机进行更改？

谢谢

Answer 1

<强>读

如果您使用org.apache.hadoop.mapreduce.lib.input.TextInputFormat，则可以在String#split中使用Mapper。

 @Override
 public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

     String[] keyValue = value.toString().split("-");
     // would emit John -> 23 as a text
     context.write(new Text(keyValue[0]), new Text(keyValue[1]));
 }

<强>编写

如果您以这种方式输出：

Text key = new Text("John");
LongWritable value = new LongWritable(23);
// of course key and value can come from the reduce method itself,
// I just want to illustrate the types
context.write(key, value);

是的，TextOutputFormat负责以您想要的格式编写：

John-23

我在Hadoop 2.x（YARN）and already answered here中遇到的唯一陷阱是该属性已重命名为mapreduce.output.textoutputformat.separator。

Hadoop（Yarn）：设置映射器输入分隔符？

1 个答案: