我正在尝试使用批处理更新语句来更新表。 DML查询可在BigQuery Web UI中成功执行,但是在进行批处理时,第一个查询成功,而其他失败。为什么会这样?
查询示例:
query = '''
update `project.dataset.Table`
set my_fk = 1234
where other_fk = 222 and
received >= PARSE_TIMESTAMP("%Y-%m-%d %H:%M:%S", "2018-01-22 05:28:12") and
received <= PARSE_TIMESTAMP("%Y-%m-%d %H:%M:%S", "2018-01-26 02:31:51")
'''
示例代码:
job_config = bigquery.QueryJobConfig()
job_config.priority = bigquery.QueryPriority.BATCH
queries = [] # list of DML Strings
jobs = []
for query in queries:
job = client.query(query, location='US', job_config=job_config)
jobs.append(job)
作业输出:
for job in jobs[1:]:
print(job.state)
# Done
print(job.error_result)
# {'message': 'Cannot set destination table in jobs with DML statements',
# 'reason': 'invalidQuery'}
print(job.use_legacy_sql)
# False
print(job.job_type)
# Query
答案 0 :(得分:1)
您的代码似乎在一次更新中就可以正常工作。这就是我尝试使用客户端API的python 3.6.5和v1.9.0
MapViewOfFile
如果这不能帮助您解决问题,请检查配置并提供完整的代码和错误日志
顺便说一句,我也在命令行中对此进行了验证
from google.cloud import bigquery
client = bigquery.Client()
query = '''
UPDATE `project.dataset.table` SET msg = null WHERE x is null
'''
job_config = bigquery.QueryJobConfig()
job_config.priority = bigquery.QueryPriority.BATCH
job = client.query(query, location='US', job_config=job_config)
print(job.state)
# PENDING
print(job.error_result)
# None
print(job.use_legacy_sql)
# False
print(job.job_type)
# Query
答案 1 :(得分:1)
我怀疑问题出在插入第一个作业后,job_config
由BigQuery API填充了某些字段(尤其是destination
)。然后,第二个作业将失败,因为它将是在作业配置中带有目标表的DML语句。您可以使用以下方法进行验证:
for query in queries:
print(job_config.destination)
job = client.query(query, location='US', job_config=job_config)
print(job_config.destination)
jobs.append(job)
要解决此问题,您可以避免对所有作业重复使用相同的job_config
:
for query in queries:
job_config = bigquery.QueryJobConfig()
job_config.priority = bigquery.QueryPriority.BATCH
job = client.query(query, location='US', job_config=job_config)
jobs.append(job)