我的数据如下:
Person | Group
-----------------
Andy | 1
Andy | 2
Doug | 2
Jack | 1
Carl | 2
Joe | 1
Joe | 2
我需要这些数据的输出,如下所示:
num_persons | num_persons_in_both_groups | overlap
--------------------------------------------------------
5 | 2 | 40%
我可以拿到num_persons
,但对num_persons_in_both_groups
感到困惑。如何使用SQL获得此信息?假设它是MySQL。 overlap
很简单。
谢谢!
答案 0 :(得分:3)
好问题!请尝试以下操作:
SELECT SUM(GroupCount) AS num_persons
, SUM(CASE WHEN PersonsInGroups > 1 THEN 1 ELSE 0 END) AS num_persons_in_both_groups
, (SUM(CASE WHEN PersonsInGroups > 1 THEN 1 ELSE 0 END) / SUM(GroupCount))*100 AS overlap
FROM
(
SELECT COUNT(1) AS PersonsInGroups
, 1 AS GroupCount
FROM t
GROUP BY Person
) x
根据评论,实际上您可以这样写:
SELECT COUNT(1) AS num_persons
, SUM(CASE WHEN PersonsInGroups > 1 THEN 1 ELSE 0 END) AS num_persons_in_both_groups
, (SUM(CASE WHEN PersonsInGroups > 1 THEN 1 ELSE 0 END) / COUNT(1))*100 AS overlap
FROM
(
SELECT COUNT(1) AS PersonsInGroups
FROM t
GROUP BY Person
) x
答案 1 :(得分:2)
我建议两种聚合级别:
select count(*) as num_persons,
sum(group_1 > 0 and group_2 > 0) as in_both,
avg(group_1 > 0 and group_2 > 0) as ratio
from (select person,
max( group = 1 ) as group_1,
max( group = 2 ) as group_2
from t
group by person
) p
我喜欢它,因为它很灵活。
但是,您也可以在没有子查询的情况下执行此操作,假设恰好有两个组:
select count(distinct person) as num_persons,
count(*) - count(distinct person) as num_in_both,
(count(*) - count(distinct person)) / count(distinct person) as ratio
from t;