我有150,000行数据,我试图在Google BigQuery中查询。
列Text
包含不同长度的文本,我想从中查询特定关键字。
我已经知道下面的查询返回包含特定关键字的所有行(例如facebook):
SELECT Text From Data.Set_1
WHERE Text CONTAINS 'facebook'
问题:
1)如何改进查询,以便返回关键字' facebook'的所有匹配项的总计数。跨越' Text'在一个新专栏?
2)如何将其升级为多个关键词(facebook,cnn,bbc,twitter)并返回数据中存在的每个关键词的总数(例如facebook 42,cnn 54,bbc 88,twitter 49)?< / p>
答案 0 :(得分:0)
您可以使用派生表来包含您要查找的所有单词,然后使用聚合来计算匹配项:
SELECT w.keyword, COUNT(s.Text)
From (SELECT 'facebook' as keyword UNION ALL
SELECT 'cnn'
) w LEFT JOIN
Data.Set_1 s
ON s.Text CONTAINS w.keyword
GROUP BY w.keyword;
请注意:这不是特别有效。性能应该与关键字数量大致呈线性关系。
答案 1 :(得分:0)
for BigQuery Legacy SQL
SELECT
keyword,
COUNT(1) AS rows,
SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences
FROM YourTable
CROSS JOIN keywords
WHERE Text CONTAINS keyword
GROUP BY keyword
使用
的示例SELECT
keyword,
COUNT(1) AS rows,
SUM(INTEGER((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword))) AS occurences
FROM (
SELECT Text FROM
(SELECT 'facebookfacebookcnnbbccnn' AS Text),
(SELECT 'facebook' AS Text),
(SELECT 'cnn' AS Text)
) AS words
CROSS JOIN (
SELECT keyword FROM
(SELECT 'facebook' AS keyword),
(SELECT 'cnn' AS keyword),
(SELECT 'bbc' AS keyword)
) AS keywords
WHERE Text CONTAINS keyword
GROUP BY keyword
对于BigQuery Standard SQL(请参阅Enabling Standard SQL)
SELECT
keyword,
COUNT(1) AS `rows`,
SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences
FROM YourTable
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword
使用
的示例WITH keywords AS (
SELECT 'facebook' AS keyword UNION ALL
SELECT 'cnn' AS keyword UNION ALL
SELECT 'bbc' AS keyword
),
words AS (
SELECT 'facebookfacebookcnnbbccnn' AS Text UNION ALL
SELECT 'facebook' AS Text UNION ALL
SELECT 'cnn' AS Text
)
SELECT
keyword,
COUNT(1) AS `rows`,
SUM((LENGTH(Text) - LENGTH(REPLACE(Text, keyword, ''))) / LENGTH(keyword)) AS occurences
FROM words
JOIN keywords
ON STRPOS(Text, keyword) > 0
GROUP BY keyword