Question

给定一个短语列表短语1，短语2 *，...短语N（假设它们在另一张表Phrase_Table中），那么如何获取bigquery表中字段F中每个短语的匹配计数？

在这里，“ *”表示短语后面必须有一些非空/非空白的字符串。

假设您有一个表，其中包含ID字段和两个字符串字段Field1，Field2

输出看起来像

id，CountOfPhrase1InField1，CountOfPhrase2InField1，CountOfPhrase1InField2，CountOfPhrase2InField2

或者我想可能不是一个输出字段，而是一个json对象字段

id，[{“ fieldName”：Field1，“ counts”：{词组1：m，词组2：mm，...}， {“ fieldName”：Field2，“ counts”：{词组1：m2，词组2：mm2，...}，...]

谢谢！

Answer 1

以下示例适用于BigQuery标准SQL

#standardSQL
WITH `project.dataset.table` AS (
SELECT 'foo1 foo foo40' str UNION ALL
SELECT 'test1 test test2 test'
), `project.dataset.keywords` AS (
  SELECT 'foo' key UNION ALL
  SELECT 'test'
)
SELECT str, ARRAY_AGG(STRUCT(key, ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, CONCAT(key, r'[^\s]'))) as matches)) all_matches
FROM `project.dataset.table` 
CROSS JOIN `project.dataset.keywords`
GROUP BY str

有结果

Row str                     all_matches.key all_matches.matches  
1   foo1 foo foo40          foo             2    
                            test            0    
2   test1 test test2 test   foo             0    
                            test            2

如果您希望将输出作为json，则可以添加TO_JSON_STRING（），如以下示例所示

#standardSQL
WITH `project.dataset.table` AS (
SELECT 'foo1 foo foo40' str UNION ALL
SELECT 'test1 test test2 test'
), `project.dataset.keywords` AS (
  SELECT 'foo' key UNION ALL
  SELECT 'test'
)
SELECT str, TO_JSON_STRING(ARRAY_AGG(STRUCT(key, ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, CONCAT(key, r'[^\s]'))) as matches))) all_matches
FROM `project.dataset.table` 
CROSS JOIN `project.dataset.keywords`
GROUP BY str

有输出

Row str                     all_matches  
1   foo1 foo foo40          [{"key":"foo","matches":2},{"key":"test","matches":0}]   
2   test1 test test2 test   [{"key":"foo","matches":0},{"key":"test","matches":2}]

有无数种呈现上述输出的方式-希望您将其调整为恰好需要的：o）

如何从bigquery中的另一个表中获取表中字段的匹配项以获取短语列表？

1 个答案: