在同一个表的多个列上添加条件计数

时间:2015-06-22 15:36:20

标签: sql postgresql join aggregate-functions

我正在寻找一种“更好”的方式来执行查询,其中我想要显示他之前玩过的单个玩家以及每个此类对手的相关赢额记录。

以下是涉及要点的表格:

create table player (player_id int, username text);
create table match (winner_id int, loser_id int);

insert into player values (1, 'john'), (2, 'mary'), (3, 'bob'), (4, 'alice');
insert into match values (1, 2), (1, 2), (1, 3), (1, 4), (1, 4), (1, 4)
                       , (2, 1), (4, 1), (4, 1);
因此,约翰对阵玛丽有2胜1负的战绩; bob赢得1胜0负;与爱丽丝相比,3胜2负。

create index idx_winners on match(winner_id);
create index idx_winners on match(loser_id);

我正在使用Postgres 9.4。我脑子里的东西让我以某种方式考虑LATERAL,但我很难理解这种“形状”。

以下是我目前使用的查询,但有些“感觉不对”。请帮助我学习和改进。

select p.username as opponent, 
       coalesce(r.won, 0) as won, 
       coalesce(r.lost, 0) as lost
from (
    select m.winner_id, m.loser_id, count(m.*) as won, (
        select t.lost
        from (
            select winner_id, loser_id, count(*) as lost
            from match
            where loser_id = m.winner_id
            and winner_id = m.loser_id
            group by winner_id, loser_id
        ) t 
    )   
    from match m
    where m.winner_id = 1   -- this would be a parameter
    group by m.winner_id, m.loser_id
) r 
join player p on p.player_id = r.loser_id;

这可以按预期工作。只是想学习一些技巧或更好的技巧来做同样的事情。

opponent  won  lost
--------  ---  ----
alice     3    2
bob       1    0
mary      2    1

4 个答案:

答案 0 :(得分:3)

具有相关子查询的解决方案:

SELECT *,
       (SELECT COUNT(*) FROM match WHERE loser_id = p.player_id),
       (SELECT COUNT(*) FROM match WHERE winner_id = p.player_id)
FROM dbo.player p WHERE player_id <> 1

使用UNION和条件聚合的解决方案:

SELECT  t.loser_id ,
        SUM(CASE WHEN result = 1 THEN 1 ELSE 0 END) ,
        SUM(CASE WHEN result = -1 THEN 1 ELSE 0 END)
FROM    ( SELECT    * , 1 AS result
          FROM      match
          WHERE     winner_id = 1
          UNION ALL
          SELECT    loser_id , winner_id , -1 AS result
          FROM      match
          WHERE     loser_id = 1
        ) t
GROUP BY t.loser_id

答案 1 :(得分:2)

对于一个“主题”球员,我只是简单地将球员加入到胜负中,并总结胜负:

SELECT opponent, SUM(won) as won, SUM(lost) as lost
FROM
(
    select w.username AS opponent, 0 AS won, 1 as lost, m.loser_id as me
    from "match" m
     inner join "player" w on m.winner_id = w.player_id

    UNION ALL

    select l.username AS opponent, 1 AS won, 0 as lost, m.winner_id as me
    from "match" m
     inner join "player" l on m.loser_id = l.player_id
) x
WHERE me = 1
GROUP BY opponent;

对于基于集合的操作,我们可以将玩家加入到相同的派生联合表中:

SELECT p.username as player, x.opponent, SUM(x.won) as won, SUM(x.lost) as lost
FROM "player" p
LEFT JOIN
(
    select w.username AS opponent, 0 AS won, 1 as lost, m.loser_id as me
    from "match" m
     inner join "player" w on m.winner_id = w.player_id

    UNION ALL

    select l.username AS opponent, 1 AS won, 0 as lost, m.winner_id as me
    from "match" m
     inner join "player" l on m.loser_id = l.player_id
) x
on p.player_id = x.me
GROUP BY player, opponent;

SqlFiddles of both here

一个小点 - 指数的名称必须是唯一的 - 大概是你的意思:

create index idx_winners on match(winner_id);
create index idx_losers on match(loser_id);

答案 2 :(得分:2)

查询

查询并不像最初看到的那么简单。最短的查询字符串不一定会产生最佳性能。这应该尽可能快,尽可能短:

SELECT p.username, COALESCE(w.ct, 0) AS won, COALESCE(l.ct, 0) AS lost
FROM  (
   SELECT loser_id AS player_id, count(*) AS ct
   FROM   match
   WHERE  winner_id = 1  -- your player_id here
   GROUP  BY 1           -- positional reference (not your player_id)
   ) w
FULL JOIN (
   SELECT winner_id AS player_id, count(*) AS ct
   FROM   match
   WHERE  loser_id = 1   -- your player_id here
   GROUP  BY 1
   ) l USING (player_id)
JOIN   player p USING (player_id)
ORDER  BY 1;

结果完全符合要求:

username | won | lost
---------+-----+-----
alice    | 3   | 2
bob      | 1   | 0
mary     | 2   | 1

SQL Fiddle - 更具启发性的测试数据!

关键特征是两个子查询之间的FULL [OUTER] JOIN表示损失和胜利。这会生成一张我们的候选人所参与的所有球员的表格。连接条件中的USING子句可以方便地将两个player_id列合并到一个中。

之后,单个JOINplayer获取名称,COALESCE将0替换为0.Voilá。

索引

使用两个多列索引

会更快
CREATE INDEX idx_winner on match (winner_id, loser_id);
CREATE INDEX idx_loser  on match (loser_id, winner_id);

只有才能获得index-only scans。然后Postgres甚至没有访问match,你得到超快的结果。

有两个integer列,您碰巧遇到本地最优:这些索引的大小与您拥有的简单索引大小相同。详细说明:

更短但很慢

您可以运行相关的子查询,例如@Giorgi suggested,只需正确地

SELECT *
FROM  (
   SELECT username
       , (SELECT count(*) FROM match
          WHERE  loser_id  = p.player_id
          AND    winner_id = 1) AS won
       , (SELECT count(*) FROM match
          WHERE  winner_id = p.player_id
          AND    loser_id  = 1) AS lost
   FROM   player p
   WHERE  player_id <> 1
   ) sub
WHERE (won > 0 OR lost > 0)
ORDER  BY username;

适用于小型表,但不能扩展。这需要对player进行顺序扫描,并对每个现有玩家match进行两次索引扫描。将效果与EXPLAIN ANALYZE进行比较。

答案 3 :(得分:0)

比原版更具可读性。想法?

with W as (
    select loser_id as opponent_id,
    count(*) as n
    from match
    where winner_id = 1
    group by loser_id
),
L as (
    select winner_id as opponent_id,
    count(*) as n
    from match
    where loser_id = 1
    group by winner_id
)
select player.username, coalesce(W.n, 0) as wins, coalesce(L.n, 0) as losses
from player
left join W on W.opponent_id = player.player_id
left join L on L.opponent_id = player.player_id
where player.player_id != 1;

                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Hash Left Join  (cost=73.78..108.58 rows=1224 width=48)
   Hash Cond: (player.player_id = l.opponent_id)
   CTE w
     ->  HashAggregate  (cost=36.81..36.83 rows=2 width=4)
           Group Key: match.loser_id
           ->  Seq Scan on match  (cost=0.00..36.75 rows=11 width=4)
                 Filter: (winner_id = 1)
   CTE l
     ->  HashAggregate  (cost=36.81..36.83 rows=2 width=4)
           Group Key: match_1.winner_id
           ->  Seq Scan on match match_1  (cost=0.00..36.75 rows=11 width=4)
                 Filter: (loser_id = 1)
   ->  Hash Left Join  (cost=0.07..30.15 rows=1224 width=44)
         Hash Cond: (player.player_id = w.opponent_id)
         ->  Seq Scan on player  (cost=0.00..25.38 rows=1224 width=36)
               Filter: (player_id <> 1)
         ->  Hash  (cost=0.04..0.04 rows=2 width=12)
               ->  CTE Scan on w  (cost=0.00..0.04 rows=2 width=12)
   ->  Hash  (cost=0.04..0.04 rows=2 width=12)
         ->  CTE Scan on l  (cost=0.00..0.04 rows=2 width=12)

上面有一个使用player_id!= 1的性能杀手。我想我只能通过扫描连接的结果来避免这种情况,不是吗?

explain with W as (
        select loser_id as opponent_id,
        count(*) as n
        from match
        where winner_id = 1 
        group by loser_id
    ),  
    L as (
        select winner_id as opponent_id,
        count(*) as n
        from match
        where loser_id = 1 
        group by winner_id
    )   
    select t.* from (
        select player.player_id, player.username, coalesce(W.n, 0) as wins, coalesce(L.n, 0) as losses
        from player
        left join W on W.opponent_id = player.player_id
        left join L on L.opponent_id = player.player_id
    ) t 
    where t.player_id != 1;

                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Hash Left Join  (cost=73.78..74.89 rows=3 width=52)
   Hash Cond: (player.player_id = l.opponent_id)
   CTE w
     ->  HashAggregate  (cost=36.81..36.83 rows=2 width=4)
           Group Key: match.loser_id
           ->  Seq Scan on match  (cost=0.00..36.75 rows=11 width=4)
                 Filter: (winner_id = 1)
   CTE l
     ->  HashAggregate  (cost=36.81..36.83 rows=2 width=4)
           Group Key: match_1.winner_id
           ->  Seq Scan on match match_1  (cost=0.00..36.75 rows=11 width=4)
                 Filter: (loser_id = 1)
   ->  Hash Left Join  (cost=0.07..1.15 rows=3 width=44)
         Hash Cond: (player.player_id = w.opponent_id)
         ->  Seq Scan on player  (cost=0.00..1.05 rows=3 width=36)
               Filter: (player_id <> 1)
         ->  Hash  (cost=0.04..0.04 rows=2 width=12)
               ->  CTE Scan on w  (cost=0.00..0.04 rows=2 width=12)
   ->  Hash  (cost=0.04..0.04 rows=2 width=12)
         ->  CTE Scan on l  (cost=0.00..0.04 rows=2 width=12)