Google BigQuery:在子查询上迭代CONTAINS函数

时间:2017-04-26 10:45:05

标签: sql google-bigquery contains

假设我有两张桌子:

girls       prefixes
------     ----------
Le-na          -na      
Lo-ve          -ve
Li-na          -la
Lu-na          -ta
Len-ka         -ya

所有女孩的名字和前缀都是不同的长度!

我想选择所有包含前缀表的女孩名字并在查询中执行(想象我有很多名字和许多前缀)。

我未经测试,对于单个案例,它正在完成如下:

SELECT girls,SOME(girls CONTAINS ("-na")) WITHIN RECORD FROM prefixes

但是如何在子查询上实现CONTAINS函数的迭代? 例如

SELECT girls,SOME(girls CONTAINS (SELECT * FROM prefixes)) WITHIN RECORD FROM prefixes   - 这不起作用,导致SELECT子句

中不允许Subselect

我真的很感激任何想法,我试图寻找这个,但找不到我的情况。

3 个答案:

答案 0 :(得分:3)

您是否尝试过使用join

select *
from girls g join
     prefixes p
     on g.girls like concat('%', p.prefix);

这应该可以使用标准SQL。

答案 1 :(得分:2)

假设前缀(井,后缀)总是三个字符,您可以使用SUBSTR的结果执行有效的半连接:

#standardSQL
WITH Girls AS (
  SELECT name
  FROM UNNEST(['Le-na', 'Lo-ve', 'Li-na', 'Lu-na', 'Len-ka']) AS name 
),
Suffixes AS (
  SELECT suffix
  FROM UNNEST(['-na', '-ve', '-la', '-ta', '-ya']) AS suffix
)
SELECT
  name
FROM Girls
WHERE EXISTS (
  SELECT 1 FROM Suffixes WHERE suffix = SUBSTR(name, LENGTH(name) - 2)
);

或者您可以使用LIKE,但它相当于使用过滤器执行交叉联接,因此它可能不会那么快:

#standardSQL
WITH Girls AS (
  SELECT name
  FROM UNNEST(['Le-na', 'Lo-ve', 'Li-na', 'Lu-na', 'Len-ka']) AS name 
),
Suffixes AS (
  SELECT suffix
  FROM UNNEST(['-na', '-ve', '-la', '-ta', '-ya']) AS suffix
)
SELECT
  name
FROM Girls
WHERE EXISTS (
  SELECT 1 FROM Suffixes WHERE name LIKE CONCAT('%', suffix)
);

编辑:枚举在半连接中使用的所有名称后缀的另一个选项:

#standardSQL
WITH Girls AS (
  SELECT name
  FROM UNNEST(['Le-na', 'Lo-ve-lala', 'Li-na', 'Lu-eya', 'Len-ka']) AS name 
),
Suffixes AS (
  SELECT suffix
  FROM UNNEST(['-na', '-ve', '-lala', '-ta', '-eya']) AS suffix
),
GirlNamePermutations AS (
  SELECT name, SUBSTR(name, LENGTH(name) + 1 - len) AS name_suffix
  FROM Girls
  CROSS JOIN UNNEST(GENERATE_ARRAY(1, (SELECT MAX(LENGTH(suffix)) FROM Suffixes))) AS len
)
SELECT
  name
FROM GirlNamePermutations
WHERE EXISTS (
  SELECT 1
  FROM Suffixes
  WHERE suffix = name_suffix
);

如果您知道后缀长度的范围,则可以改为对其进行硬编码,例如:替换:

CROSS JOIN UNNEST(GENERATE_ARRAY(1, (SELECT MAX(LENGTH(suffix)) FROM Suffixes))) AS len

使用:

CROSS JOIN UNNEST(GENERATE_ARRAY(1, 5)) AS len

答案 2 :(得分:2)

以下是BigQuery Standard SQL

  
#standardSQL
WITH girls AS (
  SELECT name
  FROM UNNEST(['Le-na', 'Lo-ve', 'Li-na', 'Lu-na', 'Len-ka']) AS name 
),
suffixes AS (
  SELECT suffix
  FROM UNNEST(['-na', '-ve', '-la', '-ta', '-ya']) AS suffix
)
SELECT name
FROM girls
JOIN suffixes
ON ENDS_WITH(name, suffix) 

作为选项 - 如果您需要扩展此内容以查找名称中的片段 - 您可以使用REGEXP_CONTAINS

SELECT name
FROM girls
JOIN suffixes
ON REGEXP_CONTAINS(name, suffix)

或 - STARTS_WITH按前缀匹配(与后缀相对)

SELECT name
FROM girls
JOIN suffixes
ON STARTS_WITH(name, suffix)