我试图将数据从一个表移动到另一表。使用SideInput在转换数据时过滤记录。 SideInput也是KV集合的类型,它从另一个表加载了数据。
运行时,我的管道出现“ java.lang.IllegalArgumentException:以未知视图调用sideInput()”错误。
这是我尝试过的全部代码:
{
PipelineOptionsFactory.register(OptionPipeline.class);
OptionPipeline options = PipelineOptionsFactory.fromArgs(args).withValidation().as(OptionPipeline.class);
Pipeline p = Pipeline.create(options);
PCollection<TableRow> sideInputData = p.apply("ReadSideInput",BigQueryIO.readTableRows().from(options.getOrgRegionMapping()));
PCollection<KV<String,String>> sideInputMap = sideInputData.apply(ParDo.of(new getSideInputDataFn()));
final PCollectionView<Map<String,String>> sideInputView = sideInputMap.apply(View.<String,String>asMap());
PCollection<TableRow> orgMaster = p.apply("ReadOrganization",BigQueryIO.readTableRows().from(options.getOrgCodeMaster()));
PCollection<TableRow> orgCode = orgMaster.apply(ParDo.of(new gnGetOrgMaster()));
@SuppressWarnings("serial")
PCollection<TableRow> finalResultCollection = orgCode.apply("Process", ParDo.of(new DoFn<TableRow, TableRow>()
{
@ProcessElement
public void processElement(ProcessContext c) {
TableRow outputRow = new TableRow();
TableRow orgCodeRow = c.element();
String orgCodefromMaster = (String) orgCodeRow.get("orgCode");
String region = c.sideInput(sideInputView).get(orgCodefromMaster);
outputRow.set("orgCode", orgCodefromMaster);
outputRow.set("orgName", orgCodeRow.get("orgName"));
outputRow.set("orgName", region);
DateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSSSSS");
Date dateobj = new Date();
outputRow.set("updatedDate",df.format(dateobj));
c.output(outputRow);
}
}));
finalResultCollection.apply(BigQueryIO.writeTableRows()
.withSchema(schema)
.to(options.getOrgCodeTable())
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
p.run().waitUntilFinish();
}
@SuppressWarnings("serial")
static class getSideInputDataFn extends DoFn<TableRow,KV<String, String>>
{
@ProcessElement
public void processElement(ProcessContext c)
{
TableRow row = c.element();
c.output(KV.of((String) row.get("orgcode"), (String) row.get("region")));
}
}
答案 0 :(得分:0)
跑步者似乎在抱怨,因为您在定义图形时从未告诉过侧面输入。在这种情况下,您需要在.withSideInputs
调用之后调用ParDo.of
,以传递对您先前定义的PCollectionView<T>
的引用。
@SuppressWarnings("serial")
PCollection<TableRow> finalResultCollection = orgCode.apply("Process", ParDo.of(new DoFn<TableRow, TableRow>()
{
@ProcessElement
public void processElement(ProcessContext c) {
TableRow outputRow = new TableRow();
TableRow orgCodeRow = c.element();
String orgCodefromMaster = (String) orgCodeRow.get("orgCode");
String region = c.sideInput(sideInputView).get(orgCodefromMaster);
outputRow.set("orgCode", orgCodefromMaster);
outputRow.set("orgName", orgCodeRow.get("orgName"));
outputRow.set("orgName", region);
DateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSSSSS");
Date dateobj = new Date();
outputRow.set("updatedDate",df.format(dateobj));
c.output(outputRow);
}
}).withSideInputs(sideInputView));
我没有测试这段代码,但是当我看它时,这才是突出的。