对withwithquery_name(WITH子句)表中的列使用UNNEST

时间:2018-08-02 05:47:35

标签: google-bigquery

我遇到以下错误

  

(100032)执行查询作业时出错。消息:无法识别的名称:嵌套

其中nested是我的临时表,声明为WITH子句。尝试的代码如下:

WITH nested AS
(
  SELECT e.my_id , SPLIT(secondary_ids, '<#>') AS arr_secondary_ids
  FROM table_with_delimited_string_column e
  WHERE my_id = 1234
)
SELECT DISTINCT a
FROM  UNNEST(nested.arr_secondary_ids) a

SPLIT函数将返回ARRAY类型,此类型稍后将被UNNESTed

根据Google Cloud文档,此方法有效:

SELECT *
FROM UNNEST(ARRAY<STRUCT<x INT64, y STRING>>[(1, 'foo'), (3, 'bar')]);

这也有效:

WITH sequences AS
  (SELECT 1 AS id, [0, 1, 1, 2, 3, 5] AS some_numbers
   UNION ALL SELECT 2 AS id, [2, 4, 8, 16, 32] AS some_numbers
   UNION ALL SELECT 3 AS id, [5, 10] AS some_numbers)
SELECT id, flattened_numbers
FROM sequences
CROSS JOIN UNNEST(sequences.some_numbers) AS flattened_numbers;

因此,从技术上讲,应该可以直接从SELECT *中查询UNNEST生成的值表。另外,临时表中的列应该能够被取消嵌套。

但是,当我回到用例时,...UNNEST(nested.arr_secondary_ids)将产生上述错误。

我希望能够立即对其进行查询,因为在结果表上获取重复的值,如您所见,我想使用DISTINCT来消除它。当前的解决方法是什么?产生此错误的技术原因是什么?想知道我是否缺少与ARRAY或STRUCT类型有关的东西...

3 个答案:

答案 0 :(得分:3)

先前的答案对理解为什么需要nested做出了巨大贡献。但是,我发现了以下最佳方案:主键(也称为my_id)对定义重复项很重要而对重复项不重要。

在我的实际情况中,my_id很重要,我不想重复my_idsecondary_id对。无法避免CROSS JOIN或多表引用,因为源CTE表必须直接位于FROM子句中(如@pruthvi-kumar@mikhail-berlyant所述)。 @mikhail-berlyant还指出,这只是一个扁平化操作,因此CROSS JOIN在这里也不是昂贵的操作。总而言之,解决方案将是:

WITH nested AS
(
  SELECT e.my_id , SPLIT(secondary_ids, '<#>') AS arr_secondary_ids
  FROM table_with_delimited_string_column e
)
SELECT DISTINCT nested.entity_id, a
FROM nested CROSS JOIN UNNEST(arr_secondary_ids) a

但是,在发布的问题中,我说的是固定的my_id = 1234,因此,此列不会成为重复项的决定因素。在这种情况下,可以通过对要取消嵌套的数组使用标量子查询来跳过相关的交叉联接。这里的最佳解决方案是:

WITH nested AS
(
  SELECT e.my_id , SPLIT(secondary_ids, '<#>') AS arr_secondary_ids
  FROM table_with_delimited_string_column e
  WHERE my_id = 1234
)
SELECT DISTINCT a
FROM UNNEST((SELECT arr_secondary_ids FROM nested)) AS a

请注意将SELECT括在嵌套中的括号。它们是必需的,否则您将收到如下消息:

  

UNNEST的参数是一个表达式,而不是查询;使用查询   作为表达式,查询必须包含其他内容   括号使其成为标量子查询表达式

nestedmy_id过滤之后还应该只有1行,否则您会发现讨厌的

  

标量子查询产生了多个元素

答案 1 :(得分:2)

在CTE之后的语句中使用UNNEST无效。 CTE之后必须是引用某些或所有CTE列的单个SELECT,INSERT,UPDATE,MERGE或DELETE语句。

对上面的代码进行此更改应该可以:

WITH nested AS
(
  SELECT e.my_id , SPLIT(secondary_ids, '<#>') AS arr_secondary_ids
  FROM table_with_delimited_string_column e
  WHERE my_id = 1234
) SELECT * FROM nested
CROSS JOIN UNNEST(nested.arr_secondary_ids) as unnested_output

如果您不想重复,请尝试以下操作:

SELECT my_id, unnested_output FROM 
(WITH nested AS
    (
      SELECT e.my_id , SPLIT(secondary_ids, '<#>') AS arr_secondary_ids
      FROM table_with_delimited_string_column e
      WHERE my_id = 1234
    ) SELECT * FROM nested
    CROSS JOIN UNNEST(nested.arr_secondary_ids) as unnested_output
) as a;

答案 2 :(得分:1)

您在FROM之后错过了表引用

WITH nested AS
(
  SELECT e.my_id , SPLIT(secondary_ids, '<#>') AS arr_secondary_ids
  FROM table_with_delimited_string_column e
  WHERE my_id = 1234
)
SELECT DISTINCT a
FROM nested, UNNEST(arr_secondary_ids) a