试图弄清楚如何使用Java API将数据插入到具有现有表的新时间分区中。我可以通过CLI执行此操作,例如:
bq query --use_legacy_sql = false --allow_large_results --replace --destination_table'isalytics.base_client_qos $ 20170104''SELECT *,CAST(SUBSTR(event_date_pst,0,10)AS DATE)as dt from analytics.client_qos_temp'< / p>
我尝试使用类似的东西通过Java API创建它:
String projectId = "analytics-145623";
String datasetId = "analytics";
String destTableId = "'analytics.base_client_qos$20170104'";
String queryString = "'SELECT *, CAST(SUBSTR(event_date_pst, 0, 10) AS DATE) as dt from analytics.client_qos_temp'";
// first create the new time partition
TableReference tableRef = new TableReference()
.setProjectId(projectId)
.setDatasetId(datasetId)
.setTableId(destTableId);
Table table = new Table();
TimePartitioning timePartitioning = new TimePartitioning();
timePartitioning.setType("DAY");
table.setTimePartitioning(timePartitioning);
table.setTableReference(tableRef);
Bigquery.Tables.Insert request = client.tables().insert(projectId, datasetId, table);
Table response = request.execute();
// next run query to insert the data
JobConfigurationQuery queryConfig = new JobConfigurationQuery()
.setQuery(querySql)
.setDestinationTable(tableRef)
.setAllowLargeResults(true)
.setUseLegacySql(false)
.setPriority("BATCH")
.setWriteDisposition("WRITE_TRUNCATE");
Job job = new Job().setConfiguration(new JobConfiguration().setQuery(queryConfig));
client.jobs().insert(projectId, job).execute();
但是这出错了:
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Invalid table ID \"'analytics.base_client_qos$20170104'\".",
"reason" : "invalid"
} ],
"message" : "Invalid table ID \"'analytics.base_client_qos$20170104'\"."
}
我挖掘了API文档,并且唯一可以添加时间分区信息的地方是使用TimePartitioning在桌面上,但它显然不起作用并且挂起了分区的名称。
我错过了什么。我试图找到这样做的一个例子,但没有运气。有谁知道怎么做?
答案 0 :(得分:0)
创建新表时,不应使用分区后缀<div class="container">
<div class="row">
<div class="col-lg-3">
1 of 3
</div>
<div class="col-lg-6">
2 of 3 (wider)
</div>
<div class="col-lg-3">
3 of 3
</div>
</div>
</div>
,只需在$20170104
调用中使用analytics.base_client_qos
即可。
但是当您client.tables().insert
查询时,请使用分区名称.setDestinationTable
答案 1 :(得分:0)
在仔细观察之后,我决定反对表格(),因为这是为了将记录流式传输到表格中,这对于我们拥有的卷而言并不合适。我问了不同的问题并决定跳过临时表,在Hive中进行预处理并使用加载而不是插入查询,这对我来说效果更好。对于下一个出现的人来说,它是这样的:
String cloudStoragePath = "gs://analytics-145623.appspot.com/user/hive/warehouse/di.db/base_client_qos_daily/year=2017/month=1/day=4/*.avro";
String projectId = "analytics-145623";
String datasetId = "analytics";
String destTableId = "base_client_qos$20170104";
TableReference tableRef = new TableReference()
.setProjectId(projectId)
.setDatasetId(datasetId)
.setTableId(destTableId);
JobConfigurationLoad loadTable = new JobConfigurationLoad()
.setDestinationTable(tableRef)
.setSourceFormat("AVRO")
.setSourceUris(Collections.singletonList(cloudStoragePath))
.setWriteDisposition("WRITE_TRUNCATE");
Job loadJob = client.jobs().insert(table.getProjectId(),
new Job().setConfiguration(new JobConfiguration().setLoad(loadTable))
).execute();
Bigquery.Jobs.Get tempTableGet = client.jobs()
.get(tempTableJob.getJobReference().getProjectId(),
tempTableJob.getJobReference().getJobId());
Job jobResult = BigQueryUtils.pollJob(tempTableGet, interval);
if (jobResult == null || jobResult.getStatus().getErrorResult() != null) {
System.out.println("Error when overwritting temp table: " +
jobResult.getStatus().getErrorResult().getReason());
}
HTH