Google BigQuery:有时TABLE_DATE_RANGE不稳定

时间:2017-05-23 14:02:25

标签: google-bigquery

有时使用TABLE_DATE_RANGE的查询在Google BigQuery中失败 但是目标表已经存在,并且失败的查询通过某些重试成功。

具体示例如下。

---------------------------------------
The Target Table
---------------------------------------
[ Dataset ID ]    my_dataset
[ Table ID ]      my_table_20170519
[ Creation Time ] 2017-05-20 02:00:52

---------------------------------------
The Executed Query
---------------------------------------
SELECT
  column1, column2, .....
FROM
  TABLE_DATE_RANGE(
    my_dataset.my_table_,
    TIMESTAMP('20170519'),
    TIMESTAMP('20170519')
  )
;

---------------------------------------
The 1st Execution and Result
---------------------------------------
[ Job Start Time ]  2017-05-20 02:00:57.513
[ Job End Time ]    2017-05-20 02:00:57.513
[ Result ]          Failure by "FROM clause with table wildcards matches no table"

---------------------------------------
The 2nd Execution and Result (Retry)
---------------------------------------
[ Job Start Time ]  2017-05-20 02:04:56.556
[ Job End Time ]    2017-05-20 02:04:56.556
[ Result ]          Failure by "FROM clause with table wildcards matches no table"

---------------------------------------
The 3rd Execution and Result (Retry)
---------------------------------------
[ Job Start Time ]  2017-05-20 02:06:43.937
[ Job End Time ]    2017-05-20 02:06:46.291
[ Result ]          Success
-


顺便说一句,使用NO TABLE_DATE_RANGE的查询总是成功 (例如FROM [my_dataset.my_table_20170519]) 几乎同时使用相同的FROM子句执行的其他查询有时会成功。


当然,使用重试可以解决上面的例子情况 但是我担心使用以下FROM子句的查询会忽略my_table_20170519。

FROM
  TABLE_DATE_RANGE(
    my_dataset.my_table_,
    TIMESTAMP('20170510'),
    TIMESTAMP('20170519')
  )


任何人都有想法解决它?

我添加了两个关于这个问题的例子。

例A:
几乎同时使用相同FROM子句执行的其他查询有时会成功。

----------------------------------------------
The Detail of Case
----------------------------------------------

- A program executes query "A" and query "B" in BigQuery.
- Both queries have the following FROM clause。

  FROM TABLE_DATE_RANGE(
    my_dataset.my_table_,
    TIMESTAMP('20170519'),
    TIMESTAMP('20170519')
  )

----------------------------------------------
The 1st Execution and Result
----------------------------------------------

Query "A"
  [ Job Start Time ]  2017-05-20 02:00:57.513
  [ Job End Time ]    2017-05-20 02:00:57.513
  [ Result ]          Failure by "FROM clause with table wildcards matches no table"

Query "B"
  [ Job Start Time ]  2017-05-20 02:00:57.507
  [ Job End Time ]    2017-05-20 02:01:09.537
  [ Result ]          Success


----------------------------------------------
The 2nd Execution and Result (Retry)
----------------------------------------------

Query "A"
  [ Job Start Time ]  2017-05-20 02:04:56.556
  [ Job End Time ]    2017-05-20 02:04:56.556
  [ Result ]          Failure by "FROM clause with table wildcards matches no table"

# Query "B" is NOT executed because it already succeeded.

----------------------------------------------
The 3rd Execution and Result (Retry)
----------------------------------------------

Query "A"
  [ Job Start Time ]  2017-05-20 02:06:43.937
  [ Job End Time ]    2017-05-20 02:06:46.291
  [ Result ]          Success

# Query "B" is NOT executed because it already succeeded.


例B:
这个问题有时会发生在" not small"一段时间。

---------------------------------------
The Target Table
---------------------------------------
[ Dataset ID ]    my_dataset
[ Table ID ]      my_table_b_20170519
[ Creation Time ] 2017-05-20 01:42:22

---------------------------------------
The Executed Query
---------------------------------------
SELECT
  column1, column2, .....
FROM
  TABLE_DATE_RANGE(
    my_dataset.my_table_b_,
    TIMESTAMP('20170519'),
    TIMESTAMP('20170519')
  )
;

----------------------------------------------
The 1st Execution and Result
----------------------------------------------
[ Job Start Time ]  2017-05-20 01:59:51.255
[ Job End Time ]    2017-05-20 01:59:51.255
[ Result ]          Failure by "FROM clause with table wildcards matches no table"

----------------------------------------------
The 2nd Execution and Result
----------------------------------------------
[ Job Start Time ]  2017-05-20 02:04:53.802
[ Job End Time ]    2017-05-20 02:04:57.684
[ Result ]          Success

1 个答案:

答案 0 :(得分:0)

我认为TABLE_DATE_RANGE使用metatables(tables.list)来获取符合条件的表列表。
问题是,虽然表数据可以立即查询 - 元数据最终可用 - 这意味着将新表放入列表需要时间。
如果您在创建后直接查询该表,则不会看到此问题。

注意:上面假设您只有在某些"小"之后才会遇到此问题。新表创建后的一段时间

另请参阅 - Eventually consistent operations - 我认为这是相关的