运行BeamSql WithoutCoder或使编码器动态化

时间:2017-12-14 05:23:15

标签: google-cloud-platform google-cloud-dataflow apache-beam

我正在从文件中读取数据并将其转换为BeamRecord但是当我在查询时显示错误 - :

Exception in thread "main" java.lang.ClassCastException: org.apache.beam.sdk.coders.SerializableCoder cannot be cast to org.apache.beam.sdk.coders.BeamRecordCoder
    at org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.registerTables(BeamSql.java:173)
    at org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:153)
    at org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:116)
    at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533)
    at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465)
    at org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:160)
    at TestingClass.main(TestingClass.java:75)

但是当我提供编码器然后它完全运行时。

我很困惑,如果我从文件中读取数据,文件数据架构在每次运行时都会发生变化,因为我正在使用模板,所以我可以使用默认编码器或无编码器,我可以运行查询。 / p>

参考代码低于请检查。

 PCollection<String> ReadFile1 = PBegin.in(p).apply(TextIO.read().from("gs://Bucket_Name/FileName.csv"));
 PCollection<BeamRecord> File1_BeamRecord = ReadFile1.apply(new StringToBeamRecord()).setCoder(new Temp().test().getRecordCoder());


  PCollection<String> ReadFile2= p.apply(TextIO.read().from("gs://Bucket_Name/FileName.csv"));
  PCollection<BeamRecord> File2_Beam_Record = ReadFile2.apply(new StringToBeamRecord()).setCoder(new Temp().test1().getRecordCoder());

new Temp()。test1()。getRecordCoder() - &gt;返回我需要在运行时获取的HardCoded BeamRecordCoder值

Conversion From PColletion<String> to PCollection<TableRow> is Below-:

Public class StringToBeamRecord extends PTransform<PCollection<String>,PCollection<BeamRecord>> {

    private static final Logger LOG = LoggerFactory.getLogger(StringToBeamRecord.class);
    @Override
    public  PCollection<BeamRecord> expand(PCollection<String> arg0) {

        return arg0.apply("Conversion",ParDo.of(new ConversionOfData()));
    }

    static class ConversionOfData extends DoFn<String,BeamRecord> implements Serializable{

        @ProcessElement
        public void processElement(ProcessContext c){
            String Data = c.element().replaceAll(",,",",blank,");
            String[] array = Data.split(",");   
            List<String> fieldNames = new ArrayList<>();
            List<Integer> fieldTypes = new ArrayList<>();
            List<Object> Data_Conversion = new ArrayList<>();
            int Count = 0;
            for(int i = 0 ; i < array.length;i++){
                fieldNames.add(new String("R"+Count).toString());
                Count++;
                fieldTypes.add(Types.VARCHAR); //Using Schema I can Set it
                Data_Conversion.add(array[i].toString());
            }
            LOG.info("The Size is : "+Data_Conversion.size());
            BeamRecordSqlType type = BeamRecordSqlType.create(fieldNames, fieldTypes);
            c.output(new BeamRecord(type,Data_Conversion));
        }
    }
}

查询是 - :

PCollectionTuple test = PCollectionTuple.of(
                    new TupleTag<BeamRecord>("File1_BeamRecord"),File1_BeamRecord)
                    .and(new TupleTag<BeamRecord>("File2_BeamRecord"), File2_BeamRecord);

PCollection<BeamRecord> output = test.apply(BeamSql.queryMulti(
                    "Select * From File1_BeamRecord JOIN File2_BeamRecord "));

无论如何我可以制作Coder Dynamic或者我可以使用默认编码器运行查询。

0 个答案:

没有答案