如何在Apache Beam中将JSON数组流式传输到BigQuery表

时间:2019-03-25 12:58:18

标签: json google-bigquery apache-beam

我的Apache Beam应用程序收到JSON数组中的消息,但将每一行插入到BigQuery表中。如何在ApacheBeam中支持此用例?我可以拆分每一行并将其插入到表中吗?

JSON消息示例:

[
  {"id": 1, "name": "post1", "price": 10},
  {"id": 2, "name": "post2", "price": 20},
  {"id": 3, "name": "post3", "price": 30}
]

BigQuery表架构:

[
    {
      "mode": "REQUIRED",
      "name": "id",
      "type": "INT64"
    },
    {
      "mode": "REQUIRED",
      "name": "name",
      "type": "STRING"
    },
    {
      "mode": "REQUIRED",
      "name": "price",
      "type": "INT64"
    }
]

1 个答案:

答案 0 :(得分:0)

这是我的解决方案。我将JSON字符串转换为List一次,然后c。一一输出。我的代码在Scala中,但是您可以在Java中执行相同的操作。

    case class MyTranscationRecord(id: String, name: String, price: Int)
    case class MyTranscation(recordList: List[MyTranscationRecord])
    class ConvertJSONTextToMyRecord extends DoFn[KafkaRecord[java.lang.Long, String], MyTranscation]() {
      private val logger: Logger = LoggerFactory.getLogger(classOf[ConvertJSONTextToMyRecord])
      @ProcessElement
      def processElement(c: ProcessContext): Unit = {
        try {
          val mapper: ObjectMapper = new ObjectMapper()
            .registerModule(DefaultScalaModule)
          val messageText = c.element.getKV.getValue
          val transaction: MyRecord = mapper.readValue(messageText, classOf[MyTranscation])
          logger.info(s"successfully converted to an EPC transaction = $transaction")
          for (record <- transaction.recordList) {
              c.output(record)
          }
        } catch {
          case e: Exception =>
            val message = e.getLocalizedMessage + e.getStackTrace
            logger.error(message)
        }
      }
    }