我有一个简单的管道,可将事件从Kafka转换为Big Query。 我想将所有失败的行保存到另一个表的Big Query中,并且,我想添加origin事件,以获取有关该错误的更多信息。 但是,因为BigQuery API仅向我发送失败的行,所以我需要使用一些技术来实现上下文。 因此,我使用的是SideInputView,其中包含行ID与事件之间的映射,然后当无法保存行时,我从SideInputView中获取其事件。
所有这些设计的结果。这项工作是从事件中抽出的10%。大多数事件从侧面输入视图映射中获取空值。但是我不
我的代码:
// create view by row id has
PCollectionView<Map<String, String>> tableRowToInsertView =
tableRowToInsertCollection
.apply(Window.<TableRowWithEvent>into(FixedWindows.of(Duration.standardMinutes(1)))
.withAllowedLateness(Duration.standardSeconds(10))
.accumulatingFiredPanes()
.triggering(AfterProcessingTime.pastFirstElementInPane()))
.apply(MapElements.via(
new SimpleFunction<TableRowWithEvent, KV<String, String>>() {
@Override
public KV<String, String> apply(TableRowWithEvent input) {
return KV.of(input.getTableRow().get("uuid").toString(), input.getEvent());
}
}))
.setCoder(KvCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of()))
.apply("CreateView", View.asMap());
//using later the side input view
writeResult
.getFailedInsertsWithErr()
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))))
.apply("BQ-insert-error-extract", ParDo.of(new BigQueryInsertErrorExtractFn(tableRowToInsertView)).withSideInputs(tableRowToInsertView))
.apply("BQ-insert-error-write", BigQueryIO.writeTableRows()
.to(errTableSpec)
.withJsonSchema(errSchema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
还有使用SideInputView的转换器函数本身
public class BigQueryInsertErrorExtractFn extends DoFn<BigQueryInsertError, TableRow> {
private final PCollectionView<Map<String, String>> tableRowToInsertView;
public BigQueryInsertErrorExtractFn(PCollectionView<Map<String, String>> tableRowToInsertView) {
this.tableRowToInsertView = tableRowToInsertView;
}
@ProcessElement
public void processElement(ProcessContext context) {
BigQueryInsertError bigQueryInsertError=context.element();
TableRow row = bigQueryInsertError.getRow();
TableRow convertedRow = new TableRow();
convertedRow.set("table_row", row.toString());
convertedRow.set("error", bigQueryInsertError.getError().toString());
convertedRow.set("error_type", ERROR_TYPE.BQ_INSERT);
convertedRow.set("t", new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSSSSS").format(new Date()));
Map<String, String> sideInput = context.sideInput(tableRowToInsertView);
String event = sideInput.get(bigQueryInsertError.getRow().get("uuid").toString());
if (event != null) {
convertedRow.set("event", event);
}
context.output(convertedRow);
}
}