给定一个短语列表短语1,短语2 *,...短语N(假设它们在另一张表Phrase_Table中),那么如何获取bigquery表中字段F中每个短语的匹配计数?>
在这里,“ *”表示短语后面必须有一些非空/非空白的字符串。
假设您有一个表,其中包含ID字段和两个字符串字段Field1,Field2
输出看起来像
id,CountOfPhrase1InField1,CountOfPhrase2InField1,CountOfPhrase1InField2,CountOfPhrase2InField2
或者我想可能不是一个输出字段,而是一个json对象字段
id,[{“ fieldName”:Field1,“ counts”:{词组1:m,词组2:mm,...}, {“ fieldName”:Field2,“ counts”:{词组1:m2,词组2:mm2,...},...]
谢谢!
答案 0 :(得分:1)
以下示例适用于BigQuery标准SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'foo1 foo foo40' str UNION ALL
SELECT 'test1 test test2 test'
), `project.dataset.keywords` AS (
SELECT 'foo' key UNION ALL
SELECT 'test'
)
SELECT str, ARRAY_AGG(STRUCT(key, ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, CONCAT(key, r'[^\s]'))) as matches)) all_matches
FROM `project.dataset.table`
CROSS JOIN `project.dataset.keywords`
GROUP BY str
有结果
Row str all_matches.key all_matches.matches
1 foo1 foo foo40 foo 2
test 0
2 test1 test test2 test foo 0
test 2
如果您希望将输出作为json,则可以添加TO_JSON_STRING(),如以下示例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'foo1 foo foo40' str UNION ALL
SELECT 'test1 test test2 test'
), `project.dataset.keywords` AS (
SELECT 'foo' key UNION ALL
SELECT 'test'
)
SELECT str, TO_JSON_STRING(ARRAY_AGG(STRUCT(key, ARRAY_LENGTH(REGEXP_EXTRACT_ALL(str, CONCAT(key, r'[^\s]'))) as matches))) all_matches
FROM `project.dataset.table`
CROSS JOIN `project.dataset.keywords`
GROUP BY str
有输出
Row str all_matches
1 foo1 foo foo40 [{"key":"foo","matches":2},{"key":"test","matches":0}]
2 test1 test test2 test [{"key":"foo","matches":0},{"key":"test","matches":2}]
有无数种呈现上述输出的方式-希望您将其调整为恰好需要的:o)