将BigQuery中的实时数据导入Python DataFrame

时间:2018-01-15 15:29:12

标签: python google-bigquery

我正在探索将BigQuery数据引入Python的方法,到目前为止,这是我的代码:

from google.cloud import bigquery
from pandas.io import gbq

client = bigquery.Client.from_service_account_json("path_to_my.json")

project_id = "my_project_name"

query_job = client.query("""
    #standardSQL
    SELECT date,
    SUM(totals.visits) AS visits
    FROM `projectname.dataset.ga_sessions_20*` AS t
    WHERE parse_date('%y%m%d', _table_suffix) between 
    DATE_sub(current_date(), interval 3 day) and
    DATE_sub(current_date(), interval 1 day)
    GROUP BY date
    """)

results = query_job.result()  # Waits for job to complete.

#for row in results:
#  print("{}: {}".format(row.date, row.visits))

results_df = gbq.read_gbq(query_job,project_id=project_id)

注释掉的行:#for row in results: print("{}: {}".format(row.date, row.visits)) 从我的查询中返回正确的结果,但它们在此表单中不可用,下一步我想将它们放入数据框中,但此代码返回错误TypeError: Object of type 'QueryJob' is not JSON serializable

有人能告诉我生成此错误的代码有什么问题,或者可能建议将BigQuery数据引入数据框的更好方法吗?

1 个答案:

答案 0 :(得分:3)

方法read_gbq需要str作为输入,而不是QueryJob

尝试这样运行:

query = """
    #standardSQL
    SELECT date,
    SUM(totals.visits) AS visits
    FROM `projectname.dataset.ga_sessions_20*` AS t
    WHERE parse_date('%y%m%d', _table_suffix) between 
    DATE_sub(current_date(), interval 3 day) and
    DATE_sub(current_date(), interval 1 day)
    GROUP BY date
"""

results_df = gbq.read_gbq(query, project_id=project_id, private_key='path_to_my.json')