如果我有这样的数据:
user + tag
-----|-----
bob | A
bob | A
bob | B
tom | A
tom | A
amy | B
amy | B
jen | A
jen | A
对于数百万用户,我想知道有多少用户拥有标签A,B和两者。这是'我'坚持的'两个'案例。
在这种情况下,答案是:
Both: 1
A only: 2
B only: 1
我不需要返回用户ID,只需要返回计数。我正在使用BigQuery。
答案 0 :(得分:3)
以下是使用SOME
和EVERY
函数的一种解决方案:
SELECT
SUM(category == 'both') AS both_count,
SUM(category == 'A') AS a_count,
SUM(category == 'B') AS b_count
FROM (
SELECT
name,
CASE WHEN SOME(tag == 'A') AND SOME(tag == 'B') THEN 'both'
WHEN EVERY(tag == 'A') THEN 'A'
WHEN EVERY(tag == 'B') THEN 'B'
ELSE 'none' END AS category
FROM
(SELECT 'bob' as name, 'A' as tag),
(SELECT 'bob' as name, 'A' as tag),
(SELECT 'bob' as name, 'B' as tag),
(SELECT 'tom' as name, 'A' as tag),
(SELECT 'tom' as name, 'A' as tag),
(SELECT 'amy' as name, 'B' as tag),
(SELECT 'amy' as name, 'B' as tag),
(SELECT 'jen' as name, 'A' as tag),
(SELECT 'jen' as name, 'A' as tag)
GROUP BY name)
答案 1 :(得分:0)
我不知道google bigquery的语法,但这里有一个基于sql的问题解决方案。
select a.tag_desc, count(distinct a.user) as total
from (
select coalesce(tA.user,tB.user) as user
, tA.tag
, tB.tag
, case
when tA.tag is not null and tB.tag is not null then 'Both'
when tA.tag is not null and tB.tag is null then 'A Only'
when tA.tag is null and tB.tag is not null then 'B Only'
end as tag_desc
from table tA
full outer join table tB
on tA.user = tB.user
and tB.tag = B
where tA.tag = 'A'
) a
有一个子查询将数据集连接回自身,并带有完整的外连接。这将允许您一起评估两个条件(A和B)。有一个案例陈述来定义三个结果。在外部查询中,我计算每个case语句结果的用户。