我们正在使用Google BigQuery在Java Web应用程序中工作,我们面临着一种奇怪的行为。
我们正在使用查询作业来检索数据,然后在某些图表中将其可视化。
我们创建并添加几个职位:
Job query1Job = startAsyncQuery(query1, "q1"+uuid);
jobMapping.put(query1Job.getJobReference().getJobId(), "q1");
runningJobs.add(query1Job);
...
Job query2Job = startAsyncQuery(query1, "q2"+uuid);
jobMapping.put(query2Job.getJobReference().getJobId(), "q2");
runningJobs.add(query2Job);
...
Job query3Job = startAsyncQuery(query1, "q3"+uuid);
jobMapping.put(query1Job.getJobReference().getJobId(), "q3");
runningJobs.add(query3Job);
...
Job query4Job = startAsyncQuery(query1, "q4"+uuid);
jobMapping.put(query4Job.getJobReference().getJobId(), "q4");
runningJobs.add(query4Job);
...
Job query5Job = startAsyncQuery(query1, "q5"+uuid);
jobMapping.put(query1Job.getJobReference().getJobId(), "q5");
runningJobs.add(query5Job);
public Job startAsyncQuery(String query, String jobId) throws IOException {
JobConfigurationQuery queryConfig = new JobConfigurationQuery().setQuery(query).setUseQueryCache(true);
JobConfiguration config = new JobConfiguration().setQuery(queryConfig);
Job job = new Job().setId(jobId).setConfiguration(config);
Job queuedJob = this.bigquery.jobs().insert(this.projectId, job).execute();
return queuedJob;
}
我们轮询正在运行的作业列表以检索数据:
boolean isError = false;
while (!runningJobs.isEmpty() && !isError) {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
}
List<Job> tempJobs = new ArrayList<Job>();
for (Job job : runningJobs) {
JobReference jref = job.getJobReference();
String jid = jobMapping.get(jref.getJobId());
int jobState = pollJob(bigQueryManager, jref, jid);
if (jobState == -1) {
System.out.println("Aborting because of error for job " + jid);
isError = true;
} else if (jobState == 1) {
List<TableRow> rows = bigQueryManager.getQueryResults(jref);
if (jid.startsWith("q1")) {
parseQ1QueryResult(filter, metrics, metricsRTLDiv, clientList, objectList, rows);
} else if (jid.startsWith("q2")) {
parseQ2QueryResult(filter, metrics, rows);
} else if (jid.startsWith("q3")) {
parseQ3QueryResult(filter, metrics, metricsRTLDiv, rows);
} else if (jid.startsWith("q4")) {
parseQ4QueryResult(metricsRTLDiv, rows);
} else if (jid.startsWith("q5")) {
parseQ5QueryResult(metrics, rows);
} else {
System.out.println("Job finished for unknown id: " + jid);
}
} else {
tempJobs.add(job);
}
}
runningJobs = tempJobs;
}
奇怪的行为是bigquery.jobs()。insert.execute()每次需要几秒钟:
Fri Mar 20 08:57:56 CET 2015 - q1
Fri Mar 20 08:57:59 CET 2015 - q2
Fri Mar 20 08:58:04 CET 2015 - q3
Fri Mar 20 08:58:09 CET 2015 - q4
Fri Mar 20 08:58:14 CET 2015 - q5
是否可以将所有这些查询放在批量请求中,这只是一个HTTP请求(而不是每个查询一个http请求)?
当我们需要执行多个查询时,有人知道从BigQuery表中检索数据的快捷方法吗? 有没有办法即兴创作工作执行速度?
感谢。
答案 0 :(得分:1)
适用于Java的Google API客户端库具有Request Batching支持。
虽然此示例适用于Calendar服务,但它可以适用于BigQuery。
JsonBatchCallback<Calendar> callback = new JsonBatchCallback<Calendar>() {
public void onSuccess(Calendar calendar, HttpHeaders responseHeaders) {
printCalendar(calendar);
addedCalendarsUsingBatch.add(calendar);
}
public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) {
System.out.println("Error Message: " + e.getMessage());
}
};
...
Calendar client = Calendar.builder(transport, jsonFactory, credential)
.setApplicationName("BatchExample/1.0").build();
BatchRequest batch = client.batch();
Calendar entry1 = new Calendar().setSummary("Calendar for Testing 1");
client.calendars().insert(entry1).queue(batch, callback);
Calendar entry2 = new Calendar().setSummary("Calendar for Testing 2");
client.calendars().insert(entry2).queue(batch, callback);
batch.execute();