我遇到了编写查询的问题,该查询可以在特定条件下查找唯一计数/重复项。我试图从一个类似于这个的表中一次得到计数:
|-P_key-|-----email-----|-act_no-|--Client--|
| 1 | joe@code.com | 1 | Jets |
| 2 | bob@code.com | 2 | Jets |
| 3 | sue@code.com | NULL | Jets |
| 4 | joe@code.com | 1 | Bills |
| 5 | bob@code.com | 2 | Bills |
| 6 | bob@code.com | 2 | Giants |
| 7 | max@code.com | 2 | Giants |
| 8 | ben@code.com | 5 | Pats |
我正在寻找的客户计数如下:
我知道我可以使用一个小组,并且为了像这样单独获得这些计数:
SELECT COUNT(email)
FROM Table
GROUP BY EMAIL
HAVING COUNT(email) > 1;
但我希望创建一个可以同时返回所有内容的代码。我正在使用SQL Server 2008。
我希望实现的输出结果如下(尽管最终数据本身必须以此为基础进行调整):
| | Jets | Bills | Giants | Pats |
| Total emails | 3 | 2 | 2 | 1 |
| unique emails across projects | 5 | 5 | 3 | 0 |
| unique account_no across projects| 6 | 6 | 4 | 0 |
| unique account_no within project | 0 | 0 | 2 | 0 |
| blank account_no within project | 1 | 0 | 0 | 0 |
OR
| | tot unique emails | duped account_no's | etc...
| Jets | 3 | 5 |
|Bills | 2 | 5 |
| Giants | 2 | 3 |
| Pats | 1 | 0 |
提前感谢您提供任何帮助!
答案 0 :(得分:2)
首先,您无法获得您提到的结构中的格式。您可以通过一行和五列来获取每个客户端。
其次,你有非常奇怪的标准。如果在多个客户端上显示电子邮件,则每个客户端的欺骗计数包含所有电子邮件的总数。好的,但您需要计算电子邮件发生的次数和确定它是否出现在多个客户端上。
解决方案是使用窗口函数计算一堆中间结果。例如,min()
和max()
窗口函数用于确定电子邮件或帐号是否出现在多个帐户中。
没有SQL小提琴来测试一个,这是我最好的尝试:
select client,
count(email) as NumEmails,
sum(case when email_minclient <> email_maxclieint then email_cnt else 0
end) as NumEmailsDuped,
sum(case when actno_minclient <> actno_maxclieint then actno_cnt else 0
end) as NumActnoDuped,
sum(case when clientactno_cnt > 1 then clientactno_cnt else 0
end) as NumActnoDupedWithin,
sum(case when ActNo is null then 1 else 0 end) as NumActnoNull
from (select t.*,
count(*) over (partition by email) as email_cnt,
count(*) over (partition by act_no) as actno_cnt,
count(*) over (partition by client, act_no) as clientactno_cnt,
min(client) over (partition by email) as email_minclient,
max(client) over (partition by email) as email_maxclient,
min(client) over (partition by act_no) as email_minactno,
max(client) over (partition by act_no) as email_maxactno
from table t
) t
group by client;
答案 1 :(得分:0)
这应该会给你想要的结果:
select client,
count(email) as "Total emails",
sum(case when email_minclient <> email_maxclient then email_cnt else 0
end) as "unique emails across projects",
sum(case when email_minclient <> email_maxclient then actno_cnt else 0
end) as "unique account_no across projects",
sum(case when clientactno_cnt > 1 then 1 else 0
end) as "unique account_no within project",
sum(case when act_no is null then 1 else 0 end) as "blank account_no within project "
from (select t.*,
count(*) over (partition by email) as email_cnt,
count(*) over (partition by act_no) as actno_cnt,
count(*) over (partition by client, act_no) as clientactno_cnt,
min(client) over (partition by email) as email_minclient,
max(client) over (partition by email) as email_maxclient
from table t
) t
group by client
向Gordon Linoff致信