Question

我将从一个例子开始。假设输入数据类似于

User1,product1,time1
User1,product2,time2
User1,product3,time3
User2,product2,time2
User2,product4,time6

现在预期的输出是我必须将数据插入数据库（在我的情况下是Aerospike（Key Value Store）），其中数据应格式化为

User1, [ [product1,time1],[product2,time2],[product3,time3] ]
User2, [ [product2,time2],[product4,time6] ]

所以在Mapper我输出下面的

UserID, [productid,timestamp]

请不要认为[x，y]表示我输出列表我可能以任何方式从mappper发送数据可能会将数据写入自定义对象

所以在接收方，我有格式

的数据

User1, [ [product1,time1],[product2,time2],[product3,time3] ]
User2, [ [product2,time2],[product4,time6] ]

现在我可以做两件事

a）我可以编写逻辑来仅在reducer中的数据库中推送这些数据（我不想这样做）

b）我想做的是，当我们做Context.write（）时，我希望将数据写入数据库。

请帮助如何完成此操作，如果可能，请附上代码段或伪代码

PS：Context.write（）做什么？它写在哪里？它的步骤和阶段是什么？

Answer 1

就我的理解而言，调用context.write涉及一定数量的步骤

在驱动程序中，我们必须指定输出格式。现在让我们看看如果我们想写一个文件会发生什么

对于写入文本文件，我们指定类似

的内容

job.setOutputFormatClass(TextOutputFormat.class);

现在，如果我们看到TextOutputFormat类的实现，它扩展了FileOutputFormat（抽象类），它实现了OutputFormat接口，而OutputFormat接口提供了两种方法

1) getRecordWriter
2) checkOutputSpecs

现在会发生什么，OutputFormatClass只是告诉你要写什么类型的记录以及记录编写者如何给出，对于一个记录编写者来说它只有Object Key, Object Value，其值可以是单个或列表，以及在记录编写器的实现中，我们指定了实际的逻辑，就像应该如何编写这条记录。

现在回到最初的问题，在我的案例中，如何将记录写入数据库Aerospike

我创建了一个自定义的OutputFormat说

public class AerospikeOutputFormat extends OutputFormat {
    //Return a new instance of record writer
    @Override
    public RecordWriter getRecordWriter(TaskAttemptContext context) throws IOException, InterruptedException {
        return new AerospikeRecordWriter(context.getConfiguration(), new Progressable() {
        @Override
        public void progress() {

        }
    });
    }

}

现在我们必须定义一个自定义记录编写器，它将获取一个键和一个值并将数据写入数据库

public class RSRVRecordWriter<KK,VV> extends RecordWriter<KK, VV> {

    @Override
    public void write(KK key, VV value) throws IOException {
        //Now here we can have an instance of aerospikeclient from a singleton class and then we could do client.put()

    }

以上代码只是一个片段，必须采取适当的设计策略。

PS：Aerospike已经给出了一个记录作者，可以在this link

扩展以满足您的需求

如何将reducer的输出写入数据库？

1 个答案: