BigQueryIO.write DynamicDestination withCreateDisposition-聚类字段

时间:2018-12-17 16:04:57

标签: google-bigquery apache-beam

BigQueryIO.write.withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)和DynamicDestinations一起我们可以写入动态表,如果该表不存在,它将根据DynamicDestinations提供的TableSchema创建该表。

我无法在TableSchema模型的一部分中添加集群字段,因为它没有这种功能。

我们如何添加具有TableSchema和聚类字段的DynamicDestinations?

2 个答案:

答案 0 :(得分:0)

bigQuery API是向表添加集群字段的一种方法

使用此link,您可以在编写代码之前测试API

function execute() {
return gapi.client.bigquery.jobs.insert({
  "resource": {
    "configuration": {
      "query": {
        "clustering": {
          "fields": [
            "Field1",
            "Field2"
          ]
        },
        "query": "select 5",
        "destinationTable": {
          "datasetId": "Id1",
          "projectId": "Project1",
          "tableId": "T1"
        }
      }
    }
  }
})
    .then(function(response) {
            // Handle the results here (response.result has the parsed body).
            console.log("Response", response);
          },
          function(err) { console.error("Execute error", err); });

}

这是有关如何操作参数的JS示例:

static setConfiguration(params, configuration) {
    //To have a destination table we MUST have a tableId
    if (params.destinationTable && params.destinationTable.tableId) {
        configuration.query.destinationTable = params.destinationTable

    }
    if (params.clusteringFields) {
        configuration.query.clustering = {fields: params.clusteringFields}
    }
    if (params.timePartitioning) {
        configuration.query.timePartitioning = {
            type: 'DAY',
            field: params.timePartitioning
        }
    }
    if (params.writeDisposition) {
        configuration.query.writeDisposition = params.writeDisposition
    }
    if (params.queryPriority && params.queryPriority.toUpperCase() === "BATCH") {
        configuration.query.priority = "BATCH"
    }
    if (params.useCache === false) {
        configuration.query.useQueryCache = params.useCache
    }
    if (params.maxBillBytes) {
        configuration.query.maximumBytesBilled = params.maxBillBytes
    }
    if (params.maxBillTier) {
        configuration.query.maximumBillingTier = params.maxBillTier
    }
}

答案 1 :(得分:0)

现在,在2.16.0版之后,BigQueryIO确实提供了在动态目标中添加clusteringFields的选项。

    @Override
    public TableDestination getTable(String eventName) {
        return new TableDestination(tableSpec,
                tableDescription, timePartitioning, clustering);
    }

请注意,第四个参数是集群,您可以使用它。