Question

我将数据写入BigQuery并成功写入。但是我关注它的编写格式。

以下是我在BigQuery中执行任何查询时显示数据的格式：

检查第一行，SalesComponent的值是CPS_H但显示＆＃39; BeamRecord [dataValues = [CPS_H＆＃39;在ModelIteration中，值以方形braket结束。

以下是用于从BeamSql将数据推送到BigQuery的代码：

TableSchema tableSchema = new TableSchema().setFields(ImmutableList.of(
    new TableFieldSchema().setName("SalesComponent").setType("STRING").setMode("REQUIRED"),
    new TableFieldSchema().setName("DuetoValue").setType("STRING").setMode("REQUIRED"),
    new TableFieldSchema().setName("ModelIteration").setType("STRING").setMode("REQUIRED")
));

TableReference tableSpec = BigQueryHelpers.parseTableSpec("beta-194409:data_id1.tables_test");
System.out.println("Start Bigquery");
final_out.apply(MapElements.into(TypeDescriptor.of(TableRow.class)).via(
    (MyOutputClass elem) -> new TableRow().set("SalesComponent", elem.SalesComponent).set("DuetoValue", elem.DuetoValue).set("ModelIteration", elem.ModelIteration)))
        .apply(BigQueryIO.writeTableRows()
        .to(tableSpec)
        .withSchema(tableSchema)
        .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
        .withWriteDisposition(WriteDisposition.WRITE_TRUNCATE));

p.run().waitUntilFinish();

编辑

我使用下面的代码将BeamRecord转换为MyOutputClass类型，这也不起作用：

 PCollection<MyOutputClass> final_out = join_query.apply(ParDo.of(new DoFn<BeamRecord, MyOutputClass>() {
        private static final long serialVersionUID = 1L;
        @ProcessElement
        public void processElement(ProcessContext c) {
             BeamRecord record = c.element();
               String[] strArr = record.toString().split(",");
            MyOutputClass moc = new MyOutputClass();
            moc.setSalesComponent(strArr[0]);
            moc.setDuetoValue(strArr[1]);
            moc.setModelIteration(strArr[2]);
            c.output(moc);
        }
    }));

Answer 1

看起来你的MyOutputClass构造不正确（值不正确）。如果您查看它，BigQueryIO能够创建具有正确字段的行。但那些领域的价值观错误。这意味着当您致电.set("SalesComponent", elem.SalesComponent)时，elem中的数据已经不正确。

我的猜测是，当您从BeamRecord转换为MyOutputClass时，问题出现在上一步中。如果您做了类似的事情（或者其他一些转换逻辑在幕后为您执行此操作），您将获得类似于您所看到的结果：

通过调用BeamRecord将beamRecord.toString()转换为字符串;
- 如果您查看BeamRecord.toString()实施，您可以看到您正在获得该字符串格式;
将此字符串拆分为,获取字符串数组;
从该数组中构造MyOutputClass;

Pseudocode就是这样的：

PCollection<MyOutputClass> final_out = 
  beamRecords
    .apply(
      ParDo.of(new DoFn() {

        @ProcessElement
        void processElement(Context c) {
           BeamRecord record = c.elem();
           String[] fields = record.toString().split(",");
           MyOutputClass elem = new MyOutputClass();
           elem.SalesComponent = fields[0];
           elem.DuetoValue = fields[1];
           ...
           c.output(elem);
        }
      })
    );

做这样的事情的正确方法是在记录上调用getter而不是沿着这些行（伪代码）分割它的字符串表示：

PCollection<MyOutputClass> final_out = 
      beamRecords
        .apply(
          ParDo.of(new DoFn() {

            @ProcessElement
            void processElement(Context c) {
               BeamRecord record = c.elem();
               MyOutputClass elem = new MyOutputClass();

               //get field value by name
               elem.SalesComponent = record.getString("CPS_H..."); 

               // get another field value by name
               elem.DuetoValue = record.getInteger("...");
               ...
               c.output(elem);
            }
          })
        );

您可以通过添加一个简单的ParDo来验证这样的内容，您可以在其中放置断点并查看调试器中的元素，或者将元素输出到其他位置（例如控制台）。

Answer 2

我能够使用以下方法解决此问题：

 PCollection<MyOutputClass> final_out = record40.apply(ParDo.of(new DoFn<BeamRecord, MyOutputClass>() {
        private static final long serialVersionUID = 1L;
        @ProcessElement
        public void processElement(ProcessContext c) throws ParseException {
             BeamRecord record = c.element();
               String strArr = record.toString();
               String strArr1 = strArr.substring(24);
               String xyz = strArr1.replace("]","");
               String[] strArr2 = xyz.split(",");

数据写入BigQuery，但格式不正确

2 个答案: