Question

相关问题：Bigquery add columns to table schema using BQ command line tools

我想使用BigQuery Python API在BigQuery中向现有表添加新列（更新现有表的架构）。

但是我的代码似乎无效。

这是我的代码：

    flow = flow_from_clientsecrets('secret_key_path', scope='my_scope')
    storage = Storage('CREDENTIAL_PATH')
    credentials = storage.get()
    if credentials is None or credentials.invalid:
        credentials = tools.run_flow(flow, storage, tools.argparser.parse_args([]))
    http = httplib2.Http()
    http = credentials.authorize(http)
    bigquery_service = build('bigquery', 'v2', http=http)
    tbObject = bigquery_service.tables()
    query_body = {'schema': {'name':'new_column_name', 'type':'STRING'}}
    tbObject.update(projectId='projectId', datasetId='datasetId', tableId='tableId', body=query_body).execute()

它返回Provided schema doesn't match existing table's schema错误。任何人都可以给我一个有效的Python示例吗？非常感谢！

Answer 1

基于Mikhail Berlyant条评论，我必须将现有的表格架构与新字段（列）一起传递给update()方法，以更新现有的表格架构。

下面给出了一个python代码示例：

...
tbObject = bigquery_service.tables()
# get current table schema
table_data = tbObject.get(projectId=projectId, datasetId=datasetId, tableId=tableId).execute()
schema = table_data.get('schema')
new_column = {'name': 'new_column_name', 'type': 'STRING'}
# append new field to current table's schema
schema.get('fields').append(new_column)
query_body = {'schema': schema}
tbObject.update(projectId='projectId', datasetId='datasetId', tableId='tableId', body=query_body).execute()

而且，没有办法为现有行（表）设置新列的值。感谢Mikhail Berlyant建议，为现有行设置值的方法是为具有值的新列创建单独的表，并将现有表与该表连接以替换旧的模式表

Answer 2

我的评论摘要（因为我现在有一些时间）：

需要将整个架构（以及新字段）提供给api
新字段将添加为现有行的null。无法设置价值
您可以在将要运行的查询中使用某些逻辑这个表来弥补这一点。或者你可以有单独的表只是这个新领域和一些你将加入你的关键带有新表的现有表来获取此字段

BigQuery：使用python BQ API向现有表添加新列

2 个答案: