Python的大查询

时间:2015-04-01 13:59:37

标签: python mysql google-bigquery

无论如何使用Python脚本在谷歌大查询上反复运行查询?

我想使用Google大查询平台查询数据集一周的数据,我希望这一年超过一年。查询数据集52次有点太繁琐了。相反,我更喜欢编写Python脚本(我知道Python)。

我希望有人可以指出我正确的方向。

2 个答案:

答案 0 :(得分:2)

BigQuery提供多种语言的客户端库 - 请参阅https://cloud.google.com/bigquery/client-libraries - 尤其是Python,使用https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/python/latest/?_ga=1.176926572.834714677.1415848949处的文档(您需要按照超链接来理解文档)。< / p>

https://cloud.google.com/bigquery/bigquery-api-quickstart给出了Java或Python中的命令行程序示例,该程序使用Google BigQuery API在其中一个可用的示例数据集上运行查询并显示结果。在导入和设置一些常量之后,Python脚本归结为

  storage = Storage('bigquery_credentials.dat')
  credentials = storage.get()

  if credentials is None or credentials.invalid:
      # Run oauth2 flow with default arguments.
      credentials = tools.run_flow(FLOW, storage, tools.argparser.parse_args([]))

  http = httplib2.Http()
  http = credentials.authorize(http)

  bigquery_service = build('bigquery', 'v2', http=http)

  try:
    query_request = bigquery_service.jobs()
    query_data = {'query':'SELECT TOP( title, 10) as title, COUNT(*) as revision_count FROM [publicdata:samples.wikipedia] WHERE wp_namespace = 0;'}

    query_response = query_request.query(projectId=PROJECT_NUMBER,
                                         body=query_data).execute()
    print 'Query Results:'
    for row in query_response['rows']:
      result_row = []
      for field in row['f']:
        result_row.append(field['v'])
      print ('\t').join(result_row)

  except HttpError as err:
    print 'Error:', pprint.pprint(err.content)

  except AccessTokenRefreshError:
    print ("Credentials have been revoked or expired, please re-run"
           "the application to re-authorize")

如您所见,只有30行,主要涉及获取和检查授权和处理错误。除了这些考虑因素之外,“核心”部分实际上只是这些部分的一半:

    bigquery_service = build('bigquery', 'v2', http=http)
    query_request = bigquery_service.jobs()
    query_data = {'query':'SELECT TOP( title, 10) as title, COUNT(*) as revision_count FROM [publicdata:samples.wikipedia] WHERE wp_namespace = 0;'}

    query_response = query_request.query(projectId=PROJECT_NUMBER,
                                         body=query_data).execute()
    print 'Query Results:'
    for row in query_response['rows']:
      result_row = []
      for field in row['f']:
        result_row.append(field['v'])
      print ('\t').join(result_row)

答案 1 :(得分:0)

您可以使用google数据流进行python,如果它是一次性的话,可以从您的终端或同等程序运行它。或者你可以在appenginecron中有一个shell脚本,循环代码52次以获取你的数据。谷歌数据流调度。