我正在探索将BigQuery数据引入Python的方法,到目前为止,这是我的代码:
from google.cloud import bigquery
from pandas.io import gbq
client = bigquery.Client.from_service_account_json("path_to_my.json")
project_id = "my_project_name"
query_job = client.query("""
#standardSQL
SELECT date,
SUM(totals.visits) AS visits
FROM `projectname.dataset.ga_sessions_20*` AS t
WHERE parse_date('%y%m%d', _table_suffix) between
DATE_sub(current_date(), interval 3 day) and
DATE_sub(current_date(), interval 1 day)
GROUP BY date
""")
results = query_job.result() # Waits for job to complete.
#for row in results:
# print("{}: {}".format(row.date, row.visits))
results_df = gbq.read_gbq(query_job,project_id=project_id)
注释掉的行:#for row in results:
print("{}: {}".format(row.date, row.visits))
从我的查询中返回正确的结果,但它们在此表单中不可用,下一步我想将它们放入数据框中,但此代码返回错误TypeError: Object of type 'QueryJob' is not JSON serializable
。
有人能告诉我生成此错误的代码有什么问题,或者可能建议将BigQuery数据引入数据框的更好方法吗?
答案 0 :(得分:3)
方法read_gbq
需要str
作为输入,而不是QueryJob
。
尝试这样运行:
query = """
#standardSQL
SELECT date,
SUM(totals.visits) AS visits
FROM `projectname.dataset.ga_sessions_20*` AS t
WHERE parse_date('%y%m%d', _table_suffix) between
DATE_sub(current_date(), interval 3 day) and
DATE_sub(current_date(), interval 1 day)
GROUP BY date
"""
results_df = gbq.read_gbq(query, project_id=project_id, private_key='path_to_my.json')