Question

试图弄清楚如何使用Java API将数据插入到具有现有表的新时间分区中。我可以通过CLI执行此操作，例如：

bq query --use_legacy_sql = false --allow_large_results --replace --destination_table'isalytics.base_client_qos $ 20170104''SELECT *，CAST（SUBSTR（event_date_pst，0,10）AS DATE）as dt from analytics.client_qos_temp'< / p>

我尝试使用类似的东西通过Java API创建它：

String projectId =  "analytics-145623";
String datasetId = "analytics";
String destTableId = "'analytics.base_client_qos$20170104'";
String queryString = "'SELECT *, CAST(SUBSTR(event_date_pst, 0, 10) AS DATE) as dt from analytics.client_qos_temp'";

// first create the new time partition
TableReference tableRef = new TableReference()
    .setProjectId(projectId)
    .setDatasetId(datasetId)
    .setTableId(destTableId);
Table table = new Table();
TimePartitioning timePartitioning = new TimePartitioning();
timePartitioning.setType("DAY");
table.setTimePartitioning(timePartitioning);
table.setTableReference(tableRef);
Bigquery.Tables.Insert request = client.tables().insert(projectId, datasetId, table);
Table response = request.execute();

// next run query to insert the data
JobConfigurationQuery queryConfig = new JobConfigurationQuery()
    .setQuery(querySql)
    .setDestinationTable(tableRef)
    .setAllowLargeResults(true)
    .setUseLegacySql(false)
    .setPriority("BATCH")
    .setWriteDisposition("WRITE_TRUNCATE");
Job job = new Job().setConfiguration(new JobConfiguration().setQuery(queryConfig));
client.jobs().insert(projectId, job).execute();

但是这出错了：

{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "Invalid table ID \"'analytics.base_client_qos$20170104'\".",
    "reason" : "invalid"
  } ],
  "message" : "Invalid table ID \"'analytics.base_client_qos$20170104'\"."
}

我挖掘了API文档，并且唯一可以添加时间分区信息的地方是使用TimePartitioning在桌面上，但它显然不起作用并且挂起了分区的名称。

我错过了什么。我试图找到这样做的一个例子，但没有运气。有谁知道怎么做？

Answer 1

创建新表时，不应使用分区后缀<div class="container"> <div class="row"> <div class="col-lg-3"> 1 of 3 </div> <div class="col-lg-6"> 2 of 3 (wider) </div> <div class="col-lg-3"> 3 of 3 </div> </div> </div>，只需在$20170104调用中使用analytics.base_client_qos即可。但是当您client.tables().insert查询时，请使用分区名称.setDestinationTable

Answer 2

在仔细观察之后，我决定反对表格（），因为这是为了将记录流式传输到表格中，这对于我们拥有的卷而言并不合适。我问了不同的问题并决定跳过临时表，在Hive中进行预处理并使用加载而不是插入查询，这对我来说效果更好。对于下一个出现的人来说，它是这样的：

String cloudStoragePath = "gs://analytics-145623.appspot.com/user/hive/warehouse/di.db/base_client_qos_daily/year=2017/month=1/day=4/*.avro";
String projectId =  "analytics-145623";
String datasetId = "analytics";
String destTableId = "base_client_qos$20170104";

TableReference tableRef = new TableReference()
    .setProjectId(projectId)
    .setDatasetId(datasetId)
    .setTableId(destTableId);

JobConfigurationLoad loadTable = new JobConfigurationLoad()
    .setDestinationTable(tableRef)
    .setSourceFormat("AVRO")
    .setSourceUris(Collections.singletonList(cloudStoragePath))
    .setWriteDisposition("WRITE_TRUNCATE");
Job loadJob =  client.jobs().insert(table.getProjectId(), 
                        new Job().setConfiguration(new JobConfiguration().setLoad(loadTable))
                    ).execute();
Bigquery.Jobs.Get tempTableGet = client.jobs()
    .get(tempTableJob.getJobReference().getProjectId(), 
    tempTableJob.getJobReference().getJobId());

Job jobResult = BigQueryUtils.pollJob(tempTableGet, interval);
if (jobResult == null || jobResult.getStatus().getErrorResult() != null) {
    System.out.println("Error when overwritting temp table: " + 
            jobResult.getStatus().getErrorResult().getReason());
}

HTH

如何在Java API

2 个答案: