BigQuery检查数组重叠

时间:2017-03-13 22:32:14

标签: python google-bigquery google-cloud-platform standard-sql

所以我编写了一个BigQuery查询,基本上只需要能够检查是否有任何字符串作为表中某个列的元素存在,其中cared-about列本身包含字符串数组。仅针对上下文,我将查询编写为一个小型自动化Python作业的一部分,并使用标准SQL。

我无法找到任何明确检查数组包含的内容:https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators

所以我想出了一个采用漂亮hacky正则表达式的解决方案,具体来说:

...other query stuff...

WHERE
    REGEXP_CONTAINS((LOWER(ARRAY_TO_STRING(column, '-'))), r"({joined_string})")

...其中column是我在表中关注的列,而joined_string是一个长字符串,由我需要检查|加入的所有字符串组成(其中|用作正则表达式OR运算符。

BigQuery标准SQL中是否存在某种内置功能,可以更加明智地实现这一功能?

1 个答案:

答案 0 :(得分:3)

以下是两个例子。

首先假设您将字符串放在另一个表strings

中   
#standardSQL
WITH yourTable AS (
  SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
  SELECT 2, ['123', '456', '789'] UNION ALL
  SELECT 3, ['135', '246', '369'] 
),
strings AS (
  SELECT 'abc' AS str UNION ALL
  SELECT '123' UNION ALL
  SELECT '456'
)
SELECT *
FROM yourTable
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) > 0  

如果您需要查看匹配的字符串数

,可以在下面添加SELECT列表
(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) AS cnt

第二个示例假设您有包含在Array

中的字符串列表
#standardSQL
WITH yourTable AS (
  SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
  SELECT 2, ['123', '456', '789'] UNION ALL
  SELECT 3, ['135', '246', '369'] 
),
strings AS (
  SELECT ['abc', 'def', '456'] AS strs
)
SELECT yourTable.*
FROM yourTable, strings
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) > 0   

与第一个示例相同 - 您可以在下方添加SELECT列表以查看匹配计数

(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) AS cnt