Question

我使用Hadoop级联来处理HDFS上的数据，如下所示：

Tap inTap_file = new Hfs(new TextDelimited(true, "|~|"), data_hadoop_inPath + "order_summary/*.txt");
Tap outTap_file = new Hfs(new TextDelimited(true, "|~|"), data_hadoop_workingPath + "order_summary");

    Pipe copyFilePipe = new Pipe("copy");
    Pipe filePipe = null;
    try {
        filePipe = PipeFactory.getPipe("order_summary_Pipe", order_summary_fields);
    } catch (Exception e) {
        LOGGER.error("Failed to get order summary pipe!", e);
    }

    FlowDef flowDef_fileType = FlowDef.flowDef().addSource(copyFilePipe, inTap_file)
            .addTailSink(filePipe, outTap_file);
    flowDef_fileType.setName("OrderSumDailyFlow");

问题在于：收件箱下有多个文件。我使用* .txt来匹配所有文件。在处理完第一个文件后，＆＃34; data_hadoop_workingPath + order_summary＆＃34;目的地已创建。处理第二个文件时，出现错误＆＃34; data_hadoop_workingPath + order_summary已经存在。＆＃34;我注意到了级联中的SinkMode，但SinkMode.UPDATE对Hadoop不起作用。如何为每个文件使用不同的接收路径？这里最好的做法是什么？谢谢！

Answer 1

一种选择是不将输出文件放在outTap_file中。它将生成部分文件。

Tap outTap_file = new Hfs(new TextDelimited(true, "|~|"),
   data_hadoop_workingPath + "order_summary");

仅提及您要写入的路径。如下所述

Tap outTap_file = new Hfs(new TextDelimited(true, "|~|"),
   data_hadoop_workingPath);

如何根据文件名将hadoop级联抽头定义到不同的路径？

1 个答案: