在PCollection分区中传递侧输入

时间:2017-11-15 06:06:04

标签: google-cloud-platform google-cloud-dataflow apache-beam

我想在PCollection分区中传递一个sideInput,在此基础上,我需要将我的PCollection除以它们无论如何....

PCollectionList<TableRow> part = merged.apply(Partition.of(Pcollection Count Function Called, new PartitionFn<TableRow>(){

                                        @Override
                                        public int partitionFor(TableRow arg0, int arg1) {

                                            return 0;
                                        }

                                    }));

我可以通过其他方式分区我的PCollection

//没有动态目标分区BigQuery表

merge.apply("write into target", BigQueryIO.writeTableRows()
                                         .to(new SerializableFunction<ValueInSingleWindow<TableRow>, TableDestination>() {
                                                @Override
                                                public TableDestination apply(ValueInSingleWindow<TableRow> value) {
                                                       TableRow row = value.getValue();
                                                       TableReference reference = new TableReference();
                                                       reference.setProjectId("XYZ");
                                                       reference.setDatasetId("ABC");
                                                       System.out.println("date of row " + row.get("authorized_transaction_date_yyyymmdd").toString());               
                                                       LOG.info("date of row "+
                                                       row.get("authorized_transaction_date_yyyymmdd").toString());
                                                       String str = row.get("authorized_transaction_date_yyyymmdd").toString();
                                                       str = str.substring(0, str.length() - 2) + "01";
                                                       System.out.println("str value " + str);
                                                       LOG.info("str value " + str);
                                                       reference.setTableId("TargetTable$" + str);
                                                       return new TableDestination(reference, null);
                                                }
                                         }).withFormatFunction(new SerializableFunction<TableRow, TableRow>() {
                                                @Override
                                                public TableRow apply(TableRow input) {
                                                       LOG.info("format function:"+input.toString());

                                                       return input;
                                                }
                                         })

                                         .withSchema(schema1).withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE)
                                         .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));

现在我必须使用Dynamic Destination Any Solution.Insted Of this并且必须进行分区。

1 个答案:

答案 0 :(得分:1)

基于在代码中看到TableRow,我怀疑你想为BigQuery写一个PCollection,向不同的BigQuery表发送不同的元素。 BigQueryIO.write()已使用BigQueryIO.write().to(DynamicDestinations)提供了一种方法。请参阅Writing different values to different BigQuery tables in Apache Beam