使用MultiSinkTap生成多个输出文件

时间:2016-03-25 06:32:15

标签: java hadoop cascading

我有以下数据集作为输入

id,name,gender
asinha161,Aniruddha,Male
vic,Victor,Male
day1,Daisy,Female
jazz030,Jasmine,Female
Mic002,Michael,Male

我的目的是将男性和女性分成两个独立的输出文件,如下所示 男性数据集

id,name,gender
asinha161,Aniruddha,Male
vic,Victor,Male
Mic002,Michael,Male

女性数据集

id,name,gender
day1,Daisy,Female
jazz030,Jasmine,Female

现在,我尝试编写一个Cascading Framework代码,该代码应该执行上述任务,代码如下

public class Main {

      public static void main(String[] args) {
          Tap sourceTap = new FileTap(new TextDelimited(true, ","),       "inputFile.txt");
          Tap sink_one = new FileTap(new TextDelimited(true, ","), "maleFile.txt");
          Tap sink_two = new FileTap(new TextDelimited(true, ","), "FemaleFile.txt");

          Pipe assembly = new Pipe("inputPipe");


          // ...split into two pipes
          Pipe malePipe = new Pipe("for_male", assembly);
          malePipe=new Each(malePipe,new CustomFilterByGender("male"));
          Pipe femalePipe = new Pipe("for_female", assembly);
          femalePipe=new Each(femalePipe, new CustomFilterByGender("female"));
          // create the flow
           List<Pipe> pipes = new ArrayList<Pipe>(2)
        {{pipes.add(countOne);
          pipes.add(countTwo);}};

          Tap outputTap=new MultiSinkTap<>(sink_one,sink_two);

          FlowConnector flowConnector = new LocalFlowConnector();
          Flow flow = flowConnector.connect(sourceTap, outputTap, pipes);
          flow.complete();
      }

其中CustomFilterByGender(字符串性别);是一个自定义函数,它根据作为参数传递的性别值返回元组。

请注意,为了提高效率,我没有使用自定义缓冲区。
使用MultiSinkTap,我无法获得所需的输出,因为connect()对象的LocalFlowConnector方法不接受导致编译时错误的MultiSinkTap对象。
如果您建议对上述代码进行可能的更改以使其正常工作或使用MultiSinkTap的方式,则必须执行此操作。
谢谢你耐心地回答这个问题:)

1 个答案:

答案 0 :(得分:4)

我认为你想把不同管道的输出写入不同的输出文件,我在你的代码中做了一些改变,应该明确地解决你的目的。

public class Main {
  public static void main(String[] args) {
      Tap sourceTap = new FileTap(new TextDelimited(true, ","), "inputFile.txt");
      Tap sink_one = new FileTap(new TextDelimited(true, ","), "maleFile.txt");
      Tap sink_two = new FileTap(new TextDelimited(true, ","), "FemaleFile.txt");

      Pipe assembly = new Pipe("inputPipe");

      Pipe malePipe = new Pipe("for_male", assembly);
      malePipe=new Each(malePipe,new CustomFilterByGender("male"));
      Pipe femalePipe = new Pipe("for_female", assembly);
      femalePipe=new Each(femalePipe, new CustomFilterByGender("female"));

      List<Pipe> pipes = new ArrayList<Pipe>(2);
      pipes.add(malePipe);
      pipes.add(femalePipe);

      Map<String, Tap> sinks = new HashMap<String, Tap>();
      sinks.put("for_male", sink_one);
      sinks.put("for_female", sink_two);

      FlowConnector flowConnector = new LocalFlowConnector();
      Flow flow = flowConnector.connect(sourceTap, sinks, pipes);
      flow.complete();
  }

您可以直接提供您想要连接到输出管道的 Map&lt;&gt; Map&lt;&gt; malePipe em>和 femalePipe