Question

我正在使用Map Reduce框架。

假设这是输入列表<"Key 1" : A> <"Key 2" : B> <"Key 3" : C> <"Key 1" : D> <"Key 2" : E> <"Key 3" : F> <"Key 1" : G> <"Key 2" : H> <"Key 3" : I> <"Key 1" : J> <"Key 2" : K> <"Key 3" : L> <"Key 1" : M> <"Key 2" : N> <"Key 3" : O> <"Key 1" : P> <"Key 2" : Q> <"Key 3" : R> <"Key 1" : S> <"Key 2" : T> <"Key 3" : U> <"Key 1" : V> <"Key 2" : W> <"Key 3" : X> <"Key 1" : Y> <"Key 2" : Z> 我的Mapper产生以下输出：

<"Key 1" : A, D, G, J, M, P, S, V, Y>
<"Key 2" : B, E, H, K, N, Q, T, W, Z>
<"Key 3" : C, F, I, L, O, R, U, X>

现在Reducer输出通常是这样的：

<"Key 1" : [A, D, G], [J, M, P], [S, V, Y]>
<"Key 2" : [B, E, H], [K, N, Q], [T, W, Z]>
<"Key 3" : [C, F, I], [L, O, R], [U, X]>

但我想做的是这样的事情：

我想将每个键的输出分成3个块，然后生成最终的Reducer输出。

所以我希望我的Reducer输出看起来像这样：

poppler-utils

任何帮助都将受到高度赞赏，因为我两天后就陷入了这个问题。我无法弄清楚最后一部分，即如何将输出分组为3个块。

P.S。如果块大小小于3（如最后一个键的示例）那么它很好，但不应超过3.

Answer 1

我认为，这很简单：

在你的reducer中，一次只需要3个值进入for循环。
使用您选择的分隔符连接这三个并写入上下文

context.write（Key，Value）

请注意，您可以根据需要写入上下文，即对于3个输出的每个块，只需写入上下文然后接下来一套3个值。

如果您发现任何困难，请告诉我。

更复杂的解决方案可能是使用MultiOutputs。您甚至可以使用此文件写入不同的文件。

一个很好的例子是here使用hadoop 1.0.2

以下是从javadocs获取的示例：

Usage in Reducer:

 <K, V> String generateFileName(K k, V v) {
   return k.toString() + "_" + v.toString();
 }

 public class MOReduce extends
   Reducer<WritableComparable, Writable,WritableComparable, Writable> {
 private MultipleOutputs mos;
 public void setup(Context context) {
 ...
 mos = new MultipleOutputs(context);
 }

 public void reduce(WritableComparable key, Iterator<Writable> values,
 Context context)
 throws IOException {
 ...
 mos.write("text", , key, new Text("Hello"));
 mos.write("seq", LongWritable(1), new Text("Bye"), "seq_a");
 mos.write("seq", LongWritable(2), key, new Text("Chau"), "seq_b");
 mos.write(key, new Text("value"), generateFileName(key, new Text("value")));
 ...
 }

 public void cleanup(Context) throws IOException {
 mos.close();
 ...
 }

 }

Answer 2

是的，您可以使用ArrayWritable作为减速器值类在您的情况下将值写为固定大小的块。

你能做的是，

维护一个固定大小为3的实例数组列表变量减速机类。
在reduce（）中，迭代给定键的值列表，并将其添加到数组列表中。
如果数组列表的大小达到3，那么只需将其转换为
ArrayWritable实例并使用键将其传递给write（），然后重置数组列表。
在作业中将outformat值类声明为ArrayWritable 配置。

MapReduce：将Reducer的结果分组为固定大小的块

2 个答案: