我在循环中使用下面提到的get_data_from_bq方法从bigquery查询数据:
def get_data_from_bq(product_ids):
format_strings = ','.join([("\"" + str(_id) + "\"") for _id in product_ids])
query = "select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in (" + format_strings + ") and eventTime > CAST(\"" + time_thresh +"\" as DATETIME) group by eventType, productId order by productId;"
query_job = bigquery_client.query(query, job_config=job_config)
return query_job.result()
虽然对于第一个查询(迭代)返回的数据是正确的,但所有后续查询都抛出了下面提到的异常
results = query_job.result()
File "/home/ishank/.local/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 2415, in result
super(QueryJob, self).result(timeout=timeout)
File "/home/ishank/.local/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 660, in result
return super(_AsyncJob, self).result(timeout=timeout)
File "/home/ishank/.local/lib/python2.7/site-packages/google/api_core/future/polling.py", line 120, in result
raise self._exception
google.api_core.exceptions.BadRequest: 400 Cannot explicitly modify anonymous table xyz:_bf4dfedaed165b3ee62d8a9efa.anon1db6c519_b4ff_dbc67c17659f
编辑1: 下面是一个抛出上述异常的示例查询。此外,这在bigquery控制台中运行顺畅。
select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in ("168561","175936","161684","161681","161686") and eventTime > CAST("2018-05-30 11:21:19" as DATETIME) group by eventType, productId order by productId;
答案 0 :(得分:6)
我有完全相同的问题。问题不在于查询本身,而是您最有可能重复使用相同的Warning: the following output files of rule process_x_only were not
present when the DAG was created:
{'processed_x.txt'}
。执行查询时,除非设置QueryJobConfig
,否则BigQuery会将结果存储在destination
对象中声明的匿名表中。如果重用此配置,BigQuery会尝试将新结果存储在同一个匿名表中,从而导致错误。
说实话,我并不特别喜欢这种行为。
您应该像这样重写代码:
QueryJobConfig
希望这有帮助!
答案 1 :(得分:1)
编辑:
Federico Bertola在解决方案和BigQuery see this link写入的临时表上是正确的。
我上次从公共表格查询示例代码时没有收到错误,但我今天可以重现错误,因此这种症状可能会出现间歇性问题。我可以通过Federico的建议确认错误已得到解决。
当查询字符串缺少查询中的参数引号时,您可以获得“super(QueryJob,self).result(timeout = timeout)”错误。您的查询中的参数format_strings似乎也犯了类似的错误。您可以通过确保参数周围有引号转义来解决此问题:
(" + myparam + ")
,应该写成
(\"" + myparam + "\")
您应该检查使用参数的查询字符串,并从更简单的查询开始,例如
select productId, eventType, count(*) as count from `xyz:xyz.abc`
,随时随地增加查询。
为了记录,这对我有用:
from google.cloud import bigquery
client = bigquery.Client()
job_config = bigquery.QueryJobConfig()
def get_data_from_bq(myparam):
query = "SELECT word, SUM(word_count) as count FROM `publicdata.samples.shakespeare` WHERE word IN (\""+myparam+"\") GROUP BY word;"
query_job = client.query(query, job_config=job_config)
return query_job.result()
mypar = "raisin"
x = 1
while (x<9):
iterator = get_data_from_bq(mypar)
print "==%d iteration==" % x
x += 1