将结果写入bigquery中的永久表

时间:2017-03-27 07:47:15

标签: python sql python-2.7 google-bigquery

我在Bigquery SQL中使用命名参数,并希望将结果写入永久表。我有两个函数1用于使用命名查询参数,1用于将查询结果写入表。如何将两者结合起来以获取写入表的查询结果;具有命名参数的查询。

  1. 这是使用参数化查询的函数:

    def sync_query_named_params(column_name,min_word_count,value):
    query = """with lsq_results as
    (select "%s" = @min_word_count)
    replace (%s  AS %s)
    from lsq.lsq_results
    """ % (min_word_count,value,column_name)
    
    client = bigquery.Client()
    
    query_results = client.run_sync_query(query
    ,
    query_parameters=(
        bigquery.ScalarQueryParameter('column_name', 'STRING', column_name),
        bigquery.ScalarQueryParameter(
            'min_word_count',
            'STRING',
            min_word_count),
        bigquery.ScalarQueryParameter('value','INT64',value)
        ))
    query_results.use_legacy_sql = False
    query_results.run()
    
  2. 写入永久表的功能

    class BigQueryClient(object):
    
       def __init__(self, bq_service, project_id, swallow_results=True):
            self.bigquery = bq_service
            self.project_id = project_id
            self.swallow_results = swallow_results
            self.cache = {}
       def write_to_table(
         self,
         query,
         dataset=None,
         table=None,
         external_udf_uris=None,
         allow_large_results=None,
         use_query_cache=None,
         priority=None,
         create_disposition=None,
         write_disposition=None,
         use_legacy_sql=None,
         maximum_billing_tier=None,
         flatten=None):
    
     configuration = {
        "query": query,
    }
    
    if dataset and table:
        configuration['destinationTable'] = {
            "projectId": self.project_id,
            "tableId": table,
            "datasetId": dataset
        }
    
    if allow_large_results is not None:
        configuration['allowLargeResults'] = allow_large_results
    
    if flatten is not None:
        configuration['flattenResults'] = flatten
    
    if maximum_billing_tier is not None:
        configuration['maximumBillingTier'] = maximum_billing_tier
    
    if use_query_cache is not None:
        configuration['useQueryCache'] = use_query_cache
    
    if use_legacy_sql is not None:
        configuration['useLegacySql'] = use_legacy_sql
    
    if priority:
        configuration['priority'] = priority
    
    if create_disposition:
        configuration['createDisposition'] = create_disposition
    
    if write_disposition:
        configuration['writeDisposition'] = write_disposition
    
    if external_udf_uris:
        configuration['userDefinedFunctionResources'] = \
            [ {'resourceUri': u} for u in external_udf_uris ]
    
    body = {
        "configuration": {
            'query': configuration
        }
    }
    
    logger.info("Creating write to table job %s" % body)
    job_resource = self._insert_job(body)
    self._raise_insert_exception_if_error(job_resource)
    return job_resource
    
  3. 如何组合2个函数来编写参数化查询并将结果写入永久表?或者如果有另一种更简单的方法。请建议。

1 个答案:

答案 0 :(得分:0)

您似乎使用了两个不同的客户端库。

您的第一个代码示例使用了BigQuery客户端库的测试版,但目前我建议不要使用它,因为它需要大量修改才能被认为是普遍可用的。 (如果您使用它,我建议使用run_async_query()使用所有可用参数创建作业,然后调用results()以获取QueryResults对象。)

您的第二个代码示例是直接创建作业资源,这是一个较低级别的界面。使用此方法时,您可以直接在查询配置中指定configuration.query.queryParameters字段。这是我现在推荐的方法。