在BigQuery中,如何在json中筛选某些子元素的和为正的行?

时间:2019-03-08 02:30:43

标签: google-bigquery

紧跟How to get count of matches in field of table for list of phrases from another table in bigquery? 您最终会遇到类似这样的情况:

Row str                     all_matches  
1   foo1 foo foo40          [{"key":"foo","matches":2},{"key":"test","matches":0}]   
2   test1 test test2 test   [{"key":"foo","matches":0},{"key":"test","matches":2}]     

如何使用StandardSQL进一步过滤总和(与所有键匹配)> 0的那些行?

1 个答案:

答案 0 :(得分:2)

为简单起见-只需将以下行添加到引用查询的末尾

HAVING SUM(ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, CONCAT(key, r'[^\s]')))) > 0   

因此,最终查询(BigQuery标准SQL)将是

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'foo1 foo foo40' str UNION ALL
  SELECT 'test1 test test2 test' UNION ALL
  SELECT 'abc xyz'
), `project.dataset.keywords` AS (
  SELECT 'foo' key UNION ALL
  SELECT 'test'
)
SELECT str, 
  TO_JSON_STRING(ARRAY_AGG(STRUCT(key, ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, CONCAT(key, r'[^\s]'))) AS matches))) all_matches
FROM `project.dataset.table` 
CROSS JOIN `project.dataset.keywords`
GROUP BY str
HAVING SUM(ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, CONCAT(key, r'[^\s]')))) > 0

有结果

Row str                     all_matches  
1   foo1 foo foo40          [{"key":"foo","matches":2},{"key":"test","matches":0}]   
2   test1 test test2 test   [{"key":"foo","matches":0},{"key":"test","matches":2}]   

注意:我在伪数据中又添加了一行,由于该行根本没有匹配项,因此从输出中将其滤除