使用多个连接的SQL计数

时间:2017-07-07 14:04:33

标签: sql sql-server

我有三个表以下列方式相互关联:

  1. 主持人(有多个会话)
  2. 会话(有多个流程)
  3. 过程
  4. 表结构如下:

    1. 主持人表 - id, name
    2. 会话桌 - id, host_id, name
    3. 流程表 - id, session_id, name
    4. 我想要实现的是会话数量和每个主机上的进程数量。为实现这一点,我尝试了以下查询,但输出错误。

      select host.id, 
             count(sessions.id) as "session count", 
             count(process.id) as "process count"
      from host as host
           left outer join sessions as sessions on host.id = sessions.host_id
           left outer join process as process on sessions.id = process.session_id
      group by host.id;
      

      以下是架构的SQLFiddle

      根据小提琴中的数据,输出应为:

      id | session count | process count 
      ----------------------------------
      1  |     2         |   3
      2  |     1         |   2
      3  |     1         |   2
      4  |     2         |   3
      

      但我得到的是:

      id | session count | process count 
      ----------------------------------
      1  |     3         |   3
      2  |     2         |   2
      3  |     2         |   2
      4  |     3         |   3
      

      获得所需输出的正确查询是什么?

4 个答案:

答案 0 :(得分:6)

鲜明;

select host.id, 
       count(distinct sessions.id) as "session count", 
       count(distinct process.id) as "process count"
from host as host
     left outer join sessions as sessions on host.id = sessions.host_id
     left outer join process as process on sessions.id = process.session_id
group by host.id;

答案 1 :(得分:1)

John Faz的答案更好,但是当你问其他方法时,可以用子查询来做到这一点:

select
  host.id,
  (select count(*) from sessions where host_id = host.id) as "session count",
  (select count(*) from process join sessions on process.session_id = sessions.id where sessions.host_id = host.id)  as "process count"
 from
   host

修改

实际上,我对约翰法兹的答案更好的回答。我刚刚在两个上面执行了一个执行计划,我的查询占了28%,约翰占了50%(22%设置并拆除)。我只使用了来自SQL Fiddle示例的非常少量的数据,并且对于大数据和不同的索引选择,事情可能会有所不同。但是它确实表明在某些情况下这个查询可能更好。

答案 2 :(得分:1)

如果您在没有group by - 子句的情况下进行查询,您将看到多次获得相同的会话ID。因此,你的会话数太高了。

select h.id as hid, s.id as sid, p.id as pid
from host h
left join sessions s on h.id = s.host_id
left join process p on s.id = p.session_id
order by h.id, s.id, p.id;

hid sid pid
-----------
1   1   1
1   1   2
1   2   5
2   5   8
2   5   9
3   3   3
3   3   7
4   4   4
4   4   6
4   6   10

因此,对会话使用count(distinct s.id)

select h.id as hid, count(distinct s.id) as session_count, count(p.id) as process_count
from host h
left join sessions s on h.id = s.host_id
left join process p on s.id = p.session_id
group by h.id

答案 3 :(得分:0)

这里真正的问题是你有一个与你合作的一对多关系链。如果它只是链中的一个关系,count()函数可以正常工作而没有问题。但是将它们链接在一起导致中间对象(在这种情况下为Session)被最终关系多次复制。这就是你获得提升的会话数的原因。

您可以使用distinct,它只对每个标识符进行一次计数。 John Faz的答案是正确的,但你只需要一个不同的,而不是两个,因为关系(过程)的最终表格不会被复制。

select
    host_id = H.ID,
    session_count = count(distinct S.ID),
    process_count = count(P.ID)
    from host H
        left join sessions S on H.ID = S.host_id
        left join process as P on S.ID = P.session_id
    group by H.ID

另一个选择是使用CTE在多个阶段执行计数。我认为这样做性能较差,特别是如果你有更多的数据,但它可以准确地模拟你想要做的计数。

;with cteSessions (session_id, host_id, process_count) as (
    select
        session_id = S.ID,
        S.host_id,
        process_count = count(1)
        from sessions S
            left join process P on S.ID = P.session_id
        group by
            S.ID,
            S.host_id
)
select
    host_id = H.ID,
    session_count = count(S.session_id),
    process_count = sum(isnull(s.process_count, 0))
    from host H
        left join cteSessions S on H.ID = S.host_id
    group by 
        H.ID

您也可以使用子查询。我讨厌哪个,但它会起作用

select
    host_id = H.ID,
    session_count = (select count(1) from sessions s where s.host_id = H.ID),
    process_count = (select count(1) from sessions s join process p on s.id = p.session_id where s.host_id = H.ID)
    from host H