紧跟How to get count of matches in field of table for list of phrases from another table in bigquery? 您最终会遇到类似这样的情况:
Row str all_matches
1 foo1 foo foo40 [{"key":"foo","matches":2},{"key":"test","matches":0}]
2 test1 test test2 test [{"key":"foo","matches":0},{"key":"test","matches":2}]
如何使用StandardSQL进一步过滤总和(与所有键匹配)> 0的那些行?
答案 0 :(得分:2)
为简单起见-只需将以下行添加到引用查询的末尾
HAVING SUM(ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, CONCAT(key, r'[^\s]')))) > 0
因此,最终查询(BigQuery标准SQL)将是
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'foo1 foo foo40' str UNION ALL
SELECT 'test1 test test2 test' UNION ALL
SELECT 'abc xyz'
), `project.dataset.keywords` AS (
SELECT 'foo' key UNION ALL
SELECT 'test'
)
SELECT str,
TO_JSON_STRING(ARRAY_AGG(STRUCT(key, ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, CONCAT(key, r'[^\s]'))) AS matches))) all_matches
FROM `project.dataset.table`
CROSS JOIN `project.dataset.keywords`
GROUP BY str
HAVING SUM(ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, CONCAT(key, r'[^\s]')))) > 0
有结果
Row str all_matches
1 foo1 foo foo40 [{"key":"foo","matches":2},{"key":"test","matches":0}]
2 test1 test test2 test [{"key":"foo","matches":0},{"key":"test","matches":2}]
注意:我在伪数据中又添加了一行,由于该行根本没有匹配项,因此从输出中将其滤除