Question

我有许多带有数据的文本文件，我想从批处理模式下运行的DataflowPipelineRunner导入到日期分区的BigQuery表。我不想在运行时插入当天的分区，而是根据每行中提到的日期插入分区。我在程序下面执行但是得到错误“BigQuery.IO.Write类型中的（String）方法不适用于参数（new SerializableFunction（）{）”

程序：

  PCollection<String> read = p.apply("Read Lines",TextIO.read().from("gs://prasadk/DataFlowGCSToBQ/employee.txt"));

  PCollection<TableRow> rows = read.apply(ParDo.of(new DoFn<String,TableRow>(){
      @ProcessElement
      public void processElement(ProcessContext c)
      {
          String[] data = c.element().split(",");

          c.output(new TableRow().set("id", data[0]).set("name", data[1]).set("designation", data[2]).set("joindate", data[3]));
      }
  }));

  rows.apply(Window.<TableRow>into(CalendarWindows.days(1)))
  .apply(BigQueryIO.writeTableRows()
    .withSchema(schema)
    .to(new SerializableFunction<ValueInSingleWindow<TableRow>, String>() {
        private static final long serialVersionUID = 1L;
        @Override
        public String apply(ValueInSingleWindow<TableRow> input) {
            // TODO Auto-generated method stub
            return null;
        }
    }));

p.run();

} }

我得到的错误是： BigQueryIO.Write类型中的（String）方法不适用于参数（new SerializableFunction，String＆gt;（）{}）

Answer 1

错误很可能是因为您在添加此功能之前使用的是Beam SDK版本 - 它已在Beam 2.0.0中添加。

此外，该功能的签名略有不同 - 需要mutate而非简单ValueInSingleWindow<T>：请参阅here。

你能指出原始代码段的位置吗？我想知道我们是否有一些过时的文档引用BoundedWindow。

使用批处理DataFlow作业中数据中存在的日期写入日期分区的bigQuery

1 个答案: