我们面临一个间歇性问题,当我们通过BigQuery Java API执行查询时,我们获得的行数与通过BigQuery UI执行相同查询时的行数不匹配。
在我们的代码中,我们使用QueryResponse对象来执行查询,我们还通过检查标志来检查查询是否完成
GetQueryResultsResponse.getJobComplete(),如果查询没有返回一个短while(queryResult.getRows() != null && queryResult.getTotalRows().compareTo(BigInteger.valueOf((queryResult.getRows().size()))) > 0) {
中的所有行,我们还有机制来提取更多记录
以下是我们用于执行查询的代码段:
int retryCount = 0;
long waitTime = Constant.BASE_WAIT_TIME;
Bigquery bigquery = cloudPlatformConnector.connectBQ();
QueryRequest queryRequest = new QueryRequest();
queryRequest.setUseLegacySql(useLegacyDialect);
GetQueryResultsResponse queryResult = null;
GetQueryResultsResponse queryPaginationResult = null;
String pageToken;
do{
try{
QueryResponse query = bigquery.jobs().query(this.projectId, queryRequest.setQuery(querySql)).execute();
queryResult = bigquery.jobs().getQueryResults(query.getJobReference().getProjectId(), query.getJobReference().getJobId()).execute();
if(queryResult != null ){
if(!queryResult.getJobComplete()){
LOGGER.info("JobId for the query : "+ query.getJobReference().getJobId() + " is Job Completed : "+ queryResult.getJobComplete());
if(queryResult.getErrors() != null){
for( ErrorProto err: queryResult.getErrors() ){
LOGGER.info("Errors in query, Reason : "+ err.getReason()+ " Location : "+ err.getLocation() +" Message : "+ err.getMessage());
}
}
LOGGER.info("Query not completed : "+querySql);
throw new IOException("Query is failing retrying it");
}
}
LOGGER.info("JobId for the query : "+ query.getJobReference().getJobId() + " is Job Completed : "+ queryResult.getJobComplete() + " Total rows from query : " + queryResult.getTotalRows());
pageToken = queryResult.getPageToken();
while(queryResult.getRows() != null && queryResult.getTotalRows().compareTo(BigInteger.valueOf((queryResult.getRows().size()))) > 0) {
LOGGER.info("Inside the Pagination code block, Page Token : "+pageToken);
queryPaginationResult = bigquery.jobs().getQueryResults(projectId,query.getJobReference().getJobId()).setPageToken(pageToken).setStartIndex(BigInteger.valueOf(queryResult.getRows().size())).execute();
queryResult.getRows().addAll(queryPaginationResult.getRows());
pageToken = queryPaginationResult.getPageToken();
LOGGER.info("Inside the Pagination code block, total size : "+ queryResult.getTotalRows() + " Current Size : "+ queryResult.getRows().size());
}
}catch(IOException ex){
retryCount ++;
LOGGER.info("BQ Connection Attempt "+retryCount +" failed, Retrying in " + waitTime + " seconds");
if (retryCount == Constant.MAX_RETRY_LIMIT) {
LOGGER.info("BQ Connection Error", ex);
throw ex;
}
try {
Thread.sleep(waitTime);
} catch (InterruptedException e) {
LOGGER.info("Thread Error");
}
waitTime *= 2;
}
}while((queryResult == null && retryCount < Constant.MAX_RETRY_LIMIT ) || (!queryResult.getJobComplete() && retryCount < Constant.MAX_RETRY_LIMIT));
return queryResult.getRows();
我没有获得所有行的查询中没有任何限制条款。
目前我们使用的是google-cloud-bigquery的0.5.0版本。
提前致谢!
答案 0 :(得分:1)
我认为在getQueryResults
的后续调用中,您需要使用上一页返回的setPageToken
正确调用pageToken
。否则getQueryResults
只会返回第一页的行。