尝试获取Apache Beam DataFlow SideInput时为空值

时间:2019-02-10 11:51:37

标签: google-bigquery apache-beam

我有一个简单的管道,可将事件从Kafka转换为Big Query。 我想将所有失败的行保存到另一个表的Big Query中,并且,我想添加origin事件,以获取有关该错误的更多信息。 但是,因为BigQuery API仅向我发送失败的行,所以我需要使用一些技术来实现上下文。 因此,我使用的是SideInputView,其中包含行ID与事件之间的映射,然后当无法保存行时,我从SideInputView中获取其事件。

所有这些设计的结果。这项工作是从事件中抽出的10%。大多数事件从侧面输入视图映射中获取空值。但是我不

我的代码:

// create view by row id has
 PCollectionView<Map<String, String>> tableRowToInsertView =
                tableRowToInsertCollection
                        .apply(Window.<TableRowWithEvent>into(FixedWindows.of(Duration.standardMinutes(1)))
                                .withAllowedLateness(Duration.standardSeconds(10))
                                .accumulatingFiredPanes()
                                .triggering(AfterProcessingTime.pastFirstElementInPane()))
                        .apply(MapElements.via(
                                new SimpleFunction<TableRowWithEvent, KV<String, String>>() {
                                    @Override
                                    public KV<String, String> apply(TableRowWithEvent input) {
                                        return KV.of(input.getTableRow().get("uuid").toString(), input.getEvent());
                                    }
                                }))
                        .setCoder(KvCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of()))
                        .apply("CreateView", View.asMap());



//using later the side input view
  writeResult
                .getFailedInsertsWithErr()
                .apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))))
                .apply("BQ-insert-error-extract", ParDo.of(new BigQueryInsertErrorExtractFn(tableRowToInsertView)).withSideInputs(tableRowToInsertView))
                .apply("BQ-insert-error-write", BigQueryIO.writeTableRows()
                        .to(errTableSpec)
                        .withJsonSchema(errSchema)
                        .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
                        .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));

还有使用SideInputView的转换器函数本身

public class BigQueryInsertErrorExtractFn extends DoFn<BigQueryInsertError, TableRow> {


private final PCollectionView<Map<String, String>> tableRowToInsertView;


public BigQueryInsertErrorExtractFn(PCollectionView<Map<String, String>> tableRowToInsertView) {
    this.tableRowToInsertView = tableRowToInsertView;
}

@ProcessElement
public void processElement(ProcessContext context) {

    BigQueryInsertError bigQueryInsertError=context.element();
    TableRow row = bigQueryInsertError.getRow();

    TableRow convertedRow = new TableRow();
    convertedRow.set("table_row", row.toString());
    convertedRow.set("error", bigQueryInsertError.getError().toString());
    convertedRow.set("error_type", ERROR_TYPE.BQ_INSERT);
    convertedRow.set("t", new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSSSSS").format(new Date()));

    Map<String, String> sideInput = context.sideInput(tableRowToInsertView);

    String event = sideInput.get(bigQueryInsertError.getRow().get("uuid").toString());
    if (event != null) {
        convertedRow.set("event", event);
    }

    context.output(convertedRow);

}

}

0 个答案:

没有答案