我有三个表以下列方式相互关联:
表结构如下:
id, name
id, host_id, name
id, session_id, name
我想要实现的是会话数量和每个主机上的进程数量。为实现这一点,我尝试了以下查询,但输出错误。
select host.id,
count(sessions.id) as "session count",
count(process.id) as "process count"
from host as host
left outer join sessions as sessions on host.id = sessions.host_id
left outer join process as process on sessions.id = process.session_id
group by host.id;
以下是架构的SQLFiddle。
根据小提琴中的数据,输出应为:
id | session count | process count
----------------------------------
1 | 2 | 3
2 | 1 | 2
3 | 1 | 2
4 | 2 | 3
但我得到的是:
id | session count | process count
----------------------------------
1 | 3 | 3
2 | 2 | 2
3 | 2 | 2
4 | 3 | 3
获得所需输出的正确查询是什么?
答案 0 :(得分:6)
鲜明;
select host.id,
count(distinct sessions.id) as "session count",
count(distinct process.id) as "process count"
from host as host
left outer join sessions as sessions on host.id = sessions.host_id
left outer join process as process on sessions.id = process.session_id
group by host.id;
答案 1 :(得分:1)
John Faz的答案更好,但是当你问其他方法时,可以用子查询来做到这一点:
select
host.id,
(select count(*) from sessions where host_id = host.id) as "session count",
(select count(*) from process join sessions on process.session_id = sessions.id where sessions.host_id = host.id) as "process count"
from
host
修改强>
实际上,我对约翰法兹的答案更好的回答。我刚刚在两个上面执行了一个执行计划,我的查询占了28%,约翰占了50%(22%设置并拆除)。我只使用了来自SQL Fiddle示例的非常少量的数据,并且对于大数据和不同的索引选择,事情可能会有所不同。但是它确实表明在某些情况下这个查询可能更好。
答案 2 :(得分:1)
如果您在没有group by
- 子句的情况下进行查询,您将看到多次获得相同的会话ID。因此,你的会话数太高了。
select h.id as hid, s.id as sid, p.id as pid
from host h
left join sessions s on h.id = s.host_id
left join process p on s.id = p.session_id
order by h.id, s.id, p.id;
hid sid pid
-----------
1 1 1
1 1 2
1 2 5
2 5 8
2 5 9
3 3 3
3 3 7
4 4 4
4 4 6
4 6 10
因此,对会话使用count(distinct s.id)
:
select h.id as hid, count(distinct s.id) as session_count, count(p.id) as process_count
from host h
left join sessions s on h.id = s.host_id
left join process p on s.id = p.session_id
group by h.id
答案 3 :(得分:0)
这里真正的问题是你有一个与你合作的一对多关系链。如果它只是链中的一个关系,count()函数可以正常工作而没有问题。但是将它们链接在一起导致中间对象(在这种情况下为Session)被最终关系多次复制。这就是你获得提升的会话数的原因。
您可以使用distinct,它只对每个标识符进行一次计数。 John Faz的答案是正确的,但你只需要一个不同的,而不是两个,因为关系(过程)的最终表格不会被复制。
select
host_id = H.ID,
session_count = count(distinct S.ID),
process_count = count(P.ID)
from host H
left join sessions S on H.ID = S.host_id
left join process as P on S.ID = P.session_id
group by H.ID
另一个选择是使用CTE在多个阶段执行计数。我认为这样做性能较差,特别是如果你有更多的数据,但它可以准确地模拟你想要做的计数。
;with cteSessions (session_id, host_id, process_count) as (
select
session_id = S.ID,
S.host_id,
process_count = count(1)
from sessions S
left join process P on S.ID = P.session_id
group by
S.ID,
S.host_id
)
select
host_id = H.ID,
session_count = count(S.session_id),
process_count = sum(isnull(s.process_count, 0))
from host H
left join cteSessions S on H.ID = S.host_id
group by
H.ID
您也可以使用子查询。我讨厌哪个,但它会起作用
select
host_id = H.ID,
session_count = (select count(1) from sessions s where s.host_id = H.ID),
process_count = (select count(1) from sessions s join process p on s.id = p.session_id where s.host_id = H.ID)
from host H