我有一个Python脚本,它从firebase下载数据,操作它然后将其转储到JSON文件中。我可以通过命令行将它上传到BigQuery,但现在我想将一些代码放入Python脚本中以便将其全部完成。
这是我到目前为止的代码。
import json
from firebase import firebase
firebase = firebase.FirebaseApplication('<redacted>')
result = firebase.get('/connection_info', None)
id_keys = map(str, result.keys())
#with open('result.json', 'r') as w:
# connection = json.load(w)
with open("w.json", "w") as outfile:
for id in id_keys:
json.dump(result[id], outfile, indent=None)
outfile.write("\n")
答案 0 :(得分:7)
要使用google-cloud-bigquery
Python库加载JSON文件,请使用Client.load_table_from_file()方法。
bigquery_client = bigquery.Client()
dataset = bigquery_client.dataset('mydataset')
table = dataset.table('mytable')
with open(source_file_name, 'rb') as source_file:
# This example uses JSON, but you can use other formats.
# See https://cloud.google.com/bigquery/loading-data
job_config = bigquery.LoadJobConfig()
job_config.source_format = 'NEWLINE_DELIMITED_JSON'
job = client.load_table_from_file(
source_file, table, job_config=job_config)
的代码示例
编辑:从Python库的0.28.0版开始,上传到表的方式发生了变化。以下是0.27及更早版本的方法。
要使用google-cloud-bigquery
Python库加载JSON文件,请使用Table.upload_from_file()
方法。
bigquery_client = bigquery.Client()
dataset = bigquery_client.dataset('mydataset')
table = dataset.table('mytable')
# Reload the table to get the schema.
table.reload()
with open(source_file_name, 'rb') as source_file:
# This example uses JSON, but you can use other formats.
# See https://cloud.google.com/bigquery/loading-data
job = table.upload_from_file(
source_file, source_format='NEWLINE_DELIMITED_JSON')
的代码示例
注意:您必须首先创建表并指定模式(也可以使用Python库完成)。遗憾的是,客户端库尚不支持架构自动检测功能:https://github.com/GoogleCloudPlatform/google-cloud-python/issues/2926
答案 1 :(得分:0)
更新2019年11月
找到了更新的documentation,用于使用Python将JSON上传到Google BigQuery。
这是我的工作解决方案:
from google.cloud import bigquery
from google.oauth2 import service_account
from dotenv import load_dotenv
load_dotenv()
client = bigquery.Client()
filename = '/path/to/file/in/nd-format.json'
dataset_id = 'DatasetName'
table_id = 'TableName'
dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
job_config.autodetect = True
with open(filename, "rb") as source_file:
job = client.load_table_from_file(
source_file,
table_ref,
location="europe-west1", # Must match the destination dataset location.
job_config=job_config,
) # API request
job.result() # Waits for table load to complete.
print("Loaded {} rows into {}:{}.".format(job.output_rows, dataset_id, table_id))