BigQueryIO.Write withJsonSchema的序列化

时间:2017-10-19 16:11:51

标签: java google-cloud-dataflow apache-beam

我的beam管道中有一个BigQueryIO.Write阶段,它是通过调用markernumber构建的:

.withJsonSchema(String)

我通过inputStream.apply( "save-to-bigquery", BigQueryIO.<Event>write() .withJsonSchema(jsonSchema) .to((ValueInSingleWindow<Event> input) -> new TableDestination( "table_name$" + PARTITION_SELECTOR.print(input.getValue().getMetadata().getTimestamp()), null) ) .withFormatFunction((ConsumerApiRequest event) -> new TableRow() .set("id", event.getMetadata().getUuid()) .set("insertId", event.getMetadata().getUuid()) .set("account_id", event.getAccountId()) ... .set("timestamp", ISODateTimeFormat.dateHourMinuteSecondMillis() .print(event.getMetadata().getTimestamp()))) .withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors()) .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND) .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER) ); 运行此操作,执行此阶段后,我收到以下错误:

DataflowRunner

似乎在管道创建/序列化时正确读取了JSON,但在执行时,正在传递反序列化的JSON表示来代替JSON字符串。我通过Guava java.lang.IllegalArgumentException: com.google.api.client.json.JsonParser.parseValue(JsonParser.java:889) com.google.api.client.json.JsonParser.parse(JsonParser.java:382) com.google.api.client.json.JsonParser.parse(JsonParser.java:336) com.google.api.client.json.JsonParser.parse(JsonParser.java:312) com.google.api.client.json.JsonFactory.fromString(JsonFactory.java:187) org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.fromJsonString(BigQueryHelpers.java:156) org.apache.beam.sdk.io.gcp.bigquery.DynamicDestinationsHelpers$ConstantSchemaDestinations.getSchema(DynamicDestinationsHelpers.java:163) org.apache.beam.sdk.io.gcp.bigquery.DynamicDestinationsHelpers$ConstantSchemaDestinations.getSchema(DynamicDestinationsHelpers.java:150) org.apache.beam.sdk.io.gcp.bigquery.CreateTables$1.processElement(CreateTables.java:103) Caused by: java.lang.IllegalArgumentException: expected collection or array type but got class com.google.api.services.bigquery.model.TableSchema com.google.api.client.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:148) com.google.api.client.util.Preconditions.checkArgument(Preconditions.java:69) com.google.api.client.json.JsonParser.parseValue(JsonParser.java:723) com.google.api.client.json.JsonParser.parse(JsonParser.java:382) com.google.api.client.json.JsonParser.parse(JsonParser.java:336) com.google.api.client.json.JsonParser.parse(JsonParser.java:312) com.google.api.client.json.JsonFactory.fromString(JsonFactory.java:187) org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.fromJsonString(BigQueryHelpers.java:156) org.apache.beam.sdk.io.gcp.bigquery.DynamicDestinationsHelpers$ConstantSchemaDestinations.getSchema(DynamicDestinationsHelpers.java:163) org.apache.beam.sdk.io.gcp.bigquery.DynamicDestinationsHelpers$ConstantSchemaDestinations.getSchema(DynamicDestinationsHelpers.java:150) org.apache.beam.sdk.io.gcp.bigquery.CreateTables$1.processElement(CreateTables.java:103) org.apache.beam.sdk.io.gcp.bigquery.CreateTables$1$DoFnInvoker.invokeProcessElement(Unknown Source) org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:177) org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:141) com.google.cloud.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:233) com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:48) com.google.cloud.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:52) com.google.cloud.dataflow.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:183) org.apache.beam.runners.core.SimpleDoFnRunner.outputWindowedValue(SimpleDoFnRunner.java:211) org.apache.beam.runners.core.SimpleDoFnRunner.access$700(SimpleDoFnRunner.java:66) org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:436) org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:424) org.apache.beam.sdk.io.gcp.bigquery.PrepareWrite$1.processElement(PrepareWrite.java:62) org.apache.beam.sdk.io.gcp.bigquery.PrepareWrite$1$DoFnInvoker.invokeProcessElement(Unknown Source) ..... 类读取资源文件来生成JSON字符串:

Resources

如何解决此序列化问题?

1 个答案:

答案 0 :(得分:2)

查看抛出异常的代码,看起来这是一个JSON解析失败 - 您的JSON架构很可能是格式错误的。根据{{​​3}},它看起来应该是这样的:

{
  "fields": [
    {
      "name": string,
      "type": string,
      "mode": string,
      "fields": [
        (TableFieldSchema)
      ],
      "description": string
    }
  ]
}

例如:

{
  "fields": [
    {
      "name": "foo",
      "type": "INTEGER"
    },
    {
      "name": "bar",
      "type": "STRING",
    }
  ]
}

查看失败的JSON解析器的代码,我怀疑您错过了外部{"fields": ...},而您的JSON字符串只包含[...]