java.lang.IllegalArgumentException:调用具有未知视图的sideInput()

时间:2018-11-14 14:39:52

标签: google-cloud-dataflow apache-beam

我试图将数据从一个表移动到另一表。使用SideInput在转换数据时过滤记录。 SideInput也是KV集合的类型,它从另一个表加载了数据。

运行时,我的管道出现“ java.lang.IllegalArgumentException:以未知视图调用sideInput()”错误。

这是我尝试过的全部代码:

{
PipelineOptionsFactory.register(OptionPipeline.class);

OptionPipeline options = PipelineOptionsFactory.fromArgs(args).withValidation().as(OptionPipeline.class);

Pipeline p = Pipeline.create(options);

PCollection<TableRow> sideInputData = p.apply("ReadSideInput",BigQueryIO.readTableRows().from(options.getOrgRegionMapping()));
PCollection<KV<String,String>> sideInputMap = sideInputData.apply(ParDo.of(new getSideInputDataFn()));
final PCollectionView<Map<String,String>> sideInputView = sideInputMap.apply(View.<String,String>asMap());



PCollection<TableRow> orgMaster = p.apply("ReadOrganization",BigQueryIO.readTableRows().from(options.getOrgCodeMaster()));
PCollection<TableRow> orgCode = orgMaster.apply(ParDo.of(new gnGetOrgMaster()));


@SuppressWarnings("serial")
PCollection<TableRow> finalResultCollection =  orgCode.apply("Process", ParDo.of(new DoFn<TableRow, TableRow>() 
{
      @ProcessElement
      public void processElement(ProcessContext c) {

          TableRow outputRow = new TableRow();

          TableRow orgCodeRow = c.element();
          String orgCodefromMaster = (String) orgCodeRow.get("orgCode");

          String region = c.sideInput(sideInputView).get(orgCodefromMaster);

          outputRow.set("orgCode", orgCodefromMaster);
          outputRow.set("orgName", orgCodeRow.get("orgName"));
          outputRow.set("orgName", region);
          DateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSSSSS");
          Date dateobj = new Date();
          outputRow.set("updatedDate",df.format(dateobj));

          c.output(outputRow);
      }
}));


finalResultCollection.apply(BigQueryIO.writeTableRows()
                     .withSchema(schema)
                     .to(options.getOrgCodeTable())
                     .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
                     .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));

p.run().waitUntilFinish();
}
@SuppressWarnings("serial")
static class getSideInputDataFn extends DoFn<TableRow,KV<String, String>>
{
    @ProcessElement
    public void processElement(ProcessContext c)
    {
        TableRow row = c.element();
        c.output(KV.of((String) row.get("orgcode"), (String) row.get("region")));
    }
}

1 个答案:

答案 0 :(得分:0)

跑步者似乎在抱怨,因为您在定义图形时从未告诉过侧面输入。在这种情况下,您需要在.withSideInputs调用之后调用ParDo.of,以传递对您先前定义的PCollectionView<T>的引用。

@SuppressWarnings("serial")
PCollection<TableRow> finalResultCollection =  orgCode.apply("Process", ParDo.of(new DoFn<TableRow, TableRow>()
{
    @ProcessElement
    public void processElement(ProcessContext c) {

        TableRow outputRow = new TableRow();

        TableRow orgCodeRow = c.element();
        String orgCodefromMaster = (String) orgCodeRow.get("orgCode");

        String region = c.sideInput(sideInputView).get(orgCodefromMaster);

        outputRow.set("orgCode", orgCodefromMaster);
        outputRow.set("orgName", orgCodeRow.get("orgName"));
        outputRow.set("orgName", region);
        DateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSSSSS");
        Date dateobj = new Date();
        outputRow.set("updatedDate",df.format(dateobj));

        c.output(outputRow);
    }
}).withSideInputs(sideInputView));

我没有测试这段代码,但是当我看它时,这才是突出的。