Question

紧跟How to get count of matches in field of table for list of phrases from another table in bigquery? 您最终会遇到类似这样的情况：

Row str                     all_matches  
1   foo1 foo foo40          [{"key":"foo","matches":2},{"key":"test","matches":0}]   
2   test1 test test2 test   [{"key":"foo","matches":0},{"key":"test","matches":2}]

如何使用StandardSQL进一步过滤总和（与所有键匹配）> 0的那些行？

Answer 1

为简单起见-只需将以下行添加到引用查询的末尾

HAVING SUM(ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, CONCAT(key, r'[^\s]')))) > 0

因此，最终查询（BigQuery标准SQL）将是

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'foo1 foo foo40' str UNION ALL
  SELECT 'test1 test test2 test' UNION ALL
  SELECT 'abc xyz'
), `project.dataset.keywords` AS (
  SELECT 'foo' key UNION ALL
  SELECT 'test'
)
SELECT str, 
  TO_JSON_STRING(ARRAY_AGG(STRUCT(key, ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, CONCAT(key, r'[^\s]'))) AS matches))) all_matches
FROM `project.dataset.table` 
CROSS JOIN `project.dataset.keywords`
GROUP BY str
HAVING SUM(ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, CONCAT(key, r'[^\s]')))) > 0

有结果

Row str                     all_matches  
1   foo1 foo foo40          [{"key":"foo","matches":2},{"key":"test","matches":0}]   
2   test1 test test2 test   [{"key":"foo","matches":0},{"key":"test","matches":2}]

注意：我在伪数据中又添加了一行，由于该行根本没有匹配项，因此从输出中将其滤除

在BigQuery中，如何在json中筛选某些子元素的和为正的行？

1 个答案: