我有一个汇总表,如下所示
user_id service no_of_trx
1 A 56
1 C 43
1 B 22
2 C 10
2 A 3
3 B 45
3 C 7
4 A 77
4 B 63
它总结了user_id使用的所有不同类型的服务,并按每个服务进行的交易次数排序。如何提取每个服务显示为顶级服务的次数?预期结果
service occurrence_as_max
A 2
B 1
C 1
因为服务A是用户1和4的顶级服务,服务B和C分别是用户3和2的顶级服务。
到目前为止我所拥有的:
WITH a as
(SELECT user_id, service, count(service) no_of_trx
FROM transactions
GROUP BY user_id, service
ORDER BY no_of_trx desc),
b as
(SELECT distinct(user_id) user, max(no_of_trx) occurrence_as_max
FROM a
GROUP BY user_id
ORDER by user)
SELECT distinct(service), b.occurrence_as_max
FROM b
LEFT JOIN a ON a.user_id=b.user.
ORDER by b.occurrence_as_max desc;
但这显然行不通。
答案 0 :(得分:2)
下面的脚本应该起作用。这是标准的查询语法。您可能需要在BigQuery中进行一些调整,但逻辑应该可以。
SELECT A.service, COUNT(*)
FROM your_table A
INNER JOIN
(
SELECT user_id, MAX(no_of_trx) no_of_trx
FROM your_table
GROUP BY user_id
)B ON A.user_id = B.user_id
AND A.no_of_trx = B.no_of_trx
GROUP BY A.service
答案 1 :(得分:0)
以下是用于BigQuery标准SQL(无需任何自联接)
#standardSQL
SELECT service, COUNT(1) AS occurrence_as_max
FROM (
SELECT STRING_AGG(service ORDER BY no_of_trx DESC LIMIT 1) service
FROM `project.dataset.table`
GROUP BY user_id
)
GROUP BY service
您可以使用问题中的示例数据来进行测试,如上示例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 user_id, 'A' service, 56 no_of_trx UNION ALL
SELECT 1, 'C', 43 UNION ALL
SELECT 1, 'B', 22 UNION ALL
SELECT 2, 'C', 10 UNION ALL
SELECT 2, 'A', 3 UNION ALL
SELECT 3, 'B', 45 UNION ALL
SELECT 3, 'C', 7 UNION ALL
SELECT 4, 'A', 77 UNION ALL
SELECT 4, 'B', 63
)
SELECT service, COUNT(1) AS occurrence_as_max
FROM (
SELECT STRING_AGG(service ORDER BY no_of_trx DESC LIMIT 1) service
FROM `project.dataset.table`
GROUP BY user_id
)
GROUP BY service
-- ORDER BY service
有结果
Row service occurrence_as_max
1 A 2
2 B 1
3 C 1
答案 2 :(得分:0)
我将为此使用常规窗口功能:
select service, countif(seqnum = 1)
from (select t.*,
row_number() over (partition by user_id order by no_of_trx desc) as seqnum
from t
) t
group by service;
如果您希望计算领带,则使用rank()
代替row_number()
。