如何在Java中创建分区的BigQuery表

时间:2016-10-24 06:44:03

标签: google-bigquery

https://cloud.google.com/bigquery/docs/creating-partitioned-tables显示了如何在Python中创建分区表。我去过那里,我已经做到了。

现在的问题是,如何使用Java API做同样的事情?什么是相应的Java代码与下面的Python代码相同:

tagNames = [];

            tagNames.push('61');
            cmt_wrds = '‏‏61'.replace(/[`~!@#$%^&*()_|+\-=?;:'",،؛«».<>\{\}\[\]\\\/]/gi, ' ').match(/\S+/g);


            if ( tagNames[0] == cmt_wrds[0] ) { // issue is here
                console.log('yes'); // --> nothing
            };
console.log(tagNames[0].length);
console.log(cmt_wrds[0].length);

缺少分区的Java:

{
  "tableReference": {
    "projectId": "myProject",
    "tableId": "table1",
    "datasetId": "mydataset"
  },
  "timePartitioning": {
    "type": "DAY"
  }
}

我正在使用Maven Central Repository的最新api版本:Job createTableJob = new Job(); JobConfiguration jobConfiguration = new JobConfiguration(); JobConfigurationLoad loadConfiguration = new JobConfigurationLoad(); createTableJob.setConfiguration(jobConfiguration); jobConfiguration.setLoad(loadConfiguration); TableReference tableReference = new TableReference() .setProjectId("myProject") .setDatasetId("mydataset") .setTableId("table1"); loadConfiguration.setDestinationTable(tableReference); // what should be place here to set DAY timePartitioning?

3 个答案:

答案 0 :(得分:3)

https://cloud.google.com/bigquery/docs/reference/v2/tables/insert https://cloud.google.com/bigquery/docs/reference/v2/tables#resource

示例Java代码:

String projectId = "";
String datasetId = "";

Table content = new Table();
TimePartitioning timePartitioning = new TimePartitioning();
timePartitioning.setType("DAY");
timePartitioning.setExpirationMs(1L);
content.setTimePartitioning(timePartitioning);

Bigquery.Tables.Insert request = bigquery.tables().insert(projectId, datasetId, content);
Table response = request.execute();

答案 1 :(得分:3)

请让我分享更新的方法来创建分区表(适用于Java API 0.32):

Schema schema = Schema.of( newFields);
TimePartitioning timePartitioning = TimePartitioning.of(TimePartitioning.Type.DAY);
TableDefinition tableDefinition = StandardTableDefinition.newBuilder()
        .setSchema(schema)
        .setTimePartitioning(timePartitioning)
        .build();

TableId tableId = TableId.of(projectName, datasetName, tableName)
TableInfo tableInfo = TableInfo.newBuilder( tableId, tableDefinition).build();
bigQuery.create( tableInfo);

19/03/2018更新:

要将某些数据加载到特定分区(或将结果作为Select插入特定分区),您只需将该分区的日期(使用后缀:$ yyyymmdd)添加到表的名称中构造 TableId 对象时。这是一个例子:

private void runJob(JobConfiguration jobConf) {
    BIG_QUERY.create(JobInfo.of(jobConf));
}

private TableId getTableToOverwrite(String tableToOverwrite, String partition) {
    return TableId.of(PROJECT, DATASET, tableToOverwrite  + "$" + partition);
}

void loadInDayPartition(String dayUrl, String dayPartition) {

    LoadJobConfiguration loadConf = LoadJobConfiguration.newBuilder(getTableToOverwrite(TABLE_LEGACY, dayPartition),
            dayUrl, FormatOptions.avro())
            .build();

    runJob(loadConf);
}

我没有任何示例可以将数据流插入到分区表中,但我猜它是相似的。

答案 2 :(得分:0)

如果要按字段分区,代码将如下所示。

Schema schema = Schema.of( fields);
Builder timeParitioningBuilder = TimePartitioning.newBuilder(TimePartitioning.Type.DAY);
timeParitioningBuilder.setField("partition_column");
TableDefinition tableDefinition = StandardTableDefinition.newBuilder()
        .setSchema(schema)
        .setTimePartitioning(timePartitioning)
        .build();

TableId tableId = TableId.of(projectName, datasetName, tableName)
TableInfo tableInfo = TableInfo.newBuilder( tableId, tableDefinition).build();
bigQuery.create( tableInfo);