当我们通过它执行查询时,BigQuery Java API不返回所有行

时间:2017-10-13 19:33:43

标签: google-bigquery

我们面临一个间歇性问题,当我们通过BigQuery Java API执行查询时,我们获得的行数与通过BigQuery UI执行相同查询时的行数不匹配。

在我们的代码中,我们使用QueryResponse对象来执行查询,我们还通过检查标志来检查查询是否完成 GetQueryResultsResponse.getJobComplete(),如果查询没有返回一个短while(queryResult.getRows() != null && queryResult.getTotalRows().compareTo(BigInteger.valueOf((queryResult.getRows().size()))) > 0) {中的所有行,我们还有机制来提取更多记录

以下是我们用于执行查询的代码段:

int retryCount = 0;
    long waitTime = Constant.BASE_WAIT_TIME;
    Bigquery bigquery = cloudPlatformConnector.connectBQ();
    QueryRequest queryRequest = new QueryRequest();
    queryRequest.setUseLegacySql(useLegacyDialect);
    GetQueryResultsResponse queryResult = null;
    GetQueryResultsResponse queryPaginationResult = null;
    String pageToken;
    do{
         try{
               QueryResponse query = bigquery.jobs().query(this.projectId, queryRequest.setQuery(querySql)).execute();
               queryResult = bigquery.jobs().getQueryResults(query.getJobReference().getProjectId(), query.getJobReference().getJobId()).execute();                   
               if(queryResult != null ){
                  if(!queryResult.getJobComplete()){
                      LOGGER.info("JobId for the query : "+ query.getJobReference().getJobId() + " is Job Completed : "+ queryResult.getJobComplete());
                      if(queryResult.getErrors() != null){
                           for( ErrorProto err: queryResult.getErrors() ){
                               LOGGER.info("Errors in query, Reason : "+ err.getReason()+ " Location : "+ err.getLocation() +" Message : "+ err.getMessage());
                           }  
                      }
                       LOGGER.info("Query not completed : "+querySql);
                       throw new IOException("Query is failing retrying it");
                   }
               }
               LOGGER.info("JobId for the query : "+ query.getJobReference().getJobId() + " is Job Completed : "+ queryResult.getJobComplete() + " Total rows from query : " + queryResult.getTotalRows());
               pageToken = queryResult.getPageToken();
               while(queryResult.getRows() != null && queryResult.getTotalRows().compareTo(BigInteger.valueOf((queryResult.getRows().size()))) > 0) {
                   LOGGER.info("Inside the Pagination code block, Page Token : "+pageToken);
                   queryPaginationResult =  bigquery.jobs().getQueryResults(projectId,query.getJobReference().getJobId()).setPageToken(pageToken).setStartIndex(BigInteger.valueOf(queryResult.getRows().size())).execute();
                   queryResult.getRows().addAll(queryPaginationResult.getRows());
                   pageToken = queryPaginationResult.getPageToken();
                   LOGGER.info("Inside the Pagination code block, total size : "+ queryResult.getTotalRows() + " Current Size : "+ queryResult.getRows().size());
               }

         }catch(IOException ex){
               retryCount ++;
               LOGGER.info("BQ Connection Attempt "+retryCount +" failed, Retrying in " + waitTime + " seconds");
               if (retryCount == Constant.MAX_RETRY_LIMIT) {
                    LOGGER.info("BQ Connection Error", ex);
                    throw ex;
               }
               try {
                    Thread.sleep(waitTime);
               } catch (InterruptedException e) {
                    LOGGER.info("Thread Error");
               }
               waitTime *= 2;
         }
    }while((queryResult == null && retryCount < Constant.MAX_RETRY_LIMIT ) || (!queryResult.getJobComplete() && retryCount < Constant.MAX_RETRY_LIMIT));
    return queryResult.getRows();

我没有获得所有行的查询中没有任何限制条款。

目前我们使用的是google-cloud-bigquery的0.5.0版本。

提前致谢!

1 个答案:

答案 0 :(得分:1)

我认为在getQueryResults的后续调用中,您需要使用上一页返回的setPageToken正确调用pageToken。否则getQueryResults只会返回第一页的行。