BeamSql转换问题

时间:2018-03-08 10:57:51

标签: apache-beam

我有下面的代码,其中我正在阅读csv文件并定义其架构,之后我将其转换为BeamRecords。然后应用BeamSql实现PTransforms。

代码

class Clo {
    public  String Outlet;
    public  String CatLib;
    public  String ProdKey;
    public  Date Week;
    public  String SalesComponent;
    public  String DuetoValue;
    public  String PrimaryCausalKey;
    public  Float CausalValue;
    public  Integer ModelIteration;
    public  Integer Published;
}

public static void main(String[] args) {

    PipelineOptions options = PipelineOptionsFactory.create();
    Pipeline p = Pipeline.create(options);
    PCollection<java.lang.String> lines= p.apply(TextIO.read().from("gs://gcpbucket/input/WeeklyDueto.csv"));
    PCollection<Clorox> pojos = lines.apply(ParDo.of(new ExtractObjectsFn()));

    List<java.lang.String> fieldNames = Arrays.asList("Outlet", "CatLib", "ProdKey", "Week", "SalesComponent", "DuetoValue", "PrimaryCausalKey", "CausalValue", "ModelIteration", "Published");
    List<java.lang.Integer> fieldTypes = Arrays.asList(Types.VARCHAR, Types.VARCHAR, Types.VARCHAR, Types.DATE, Types.VARCHAR,Types.VARCHAR,Types.VARCHAR, Types.FLOAT, Types.INTEGER, Types.INTEGER);
    BeamRecordSqlType appType = BeamRecordSqlType.create(fieldNames, fieldTypes);

    PCollection<BeamRecord> apps = pojos.apply(
        ParDo.of(new DoFn<Clo, BeamRecord>() {

            @ProcessElement
            public void processElement(ProcessContext c) {
                BeamRecord br = new BeamRecord(
                    appType, 
                    c.element().Outlet, 
                    c.element().CatLib, 
                    c.element().ProdKey,
                    c.element().Week, 
                    c.element().SalesComponent,
                    c.element().DuetoValue,
                    c.element().PrimaryCausalKey,
                    c.element().CausalValue,
                    c.element().ModelIteration,
                    c.element().Published
                );
                c.output(br);     
            }
        })).setCoder(appType, getRecordCoder()); 

    PCollection<BeamRecord> out = apps.apply(BeamSql.query("select Outlet from PCOLLECTION"));
    out.apply("WriteMyFile", TextIO.write().to("gs://gcpbucket/output/sbc.txt"));
}

我的问题是:

  1. 我应该在ExtractObjectsFn()中实现什么,以便将记录转换为BeamRecords?
  2. 如何将最终输出写入csv文件?
  3. 我已将ExtractObjectsFn()实现为:

    public void processElement(ProcessContext c) {
    
        ArrayList<Clo> clx = new ArrayList<Clo>();
        java.lang.String[] strArr = c.element().split("\n");
    
        for(int i = 0; i < strArr.length; i++) {
            Clo clo = new Clo();
            java.lang.String[] temp = strArr[i].split(",");
            clo.setCatLib(temp[1]);
            clo.setCausalValue(temp[7]);
            clo.setDuetoValue(temp[5]);
            clo.setModelIteration(temp[8]);
            clo.setOutlet(temp[0]);
            clo.setPrimaryCausalKey(temp[6]);
            clo.setProdKey(temp[2]);
            clo.setPublished(temp[9]);
            clo.setSalesComponent(temp[4]);
            clo.setWeek(temp[3]);
            c.output(clo);
            clx.add(clo);
        }   
    }
    

    让我知道它是否正确完成,因为在执行代码并获得错误No Coder has been manually specified; you may do so using .setCoder().

1 个答案:

答案 0 :(得分:2)

  

1&GT;我应该在ExtractObjectsFn()中实现什么,以便将记录转换为BeamRecords?

processElement()的{​​{1}}方法中,您只需将CSV行从输入(ExtractObjectsFn)转换为String类型即可。用逗号分隔符(Clorox)拆分字符串,它返回一个数组。迭代数组以检索CSV值并构造,对象。

  

2 - ;如何将最终输出写入csv文件?

与上述类似的过程。您只需应用一个新的转换,将Clorox转换为CSV行(BeamRecord)。 String的成员可以连接成一个字符串(CSV行)。应用此变换后,可以应用BeamRecord变换将CSV行写入文件。