SQL:如何加速多连接查询

时间:2014-07-30 12:44:56

标签: sql postgresql

我有两张桌子:

registrations

event_time         | name |
---------------------------
2014-07-16 11:40     Bob1
2014-07-16 10:00     Bob2
2014-07-16 09:20     Bob3
2014-07-15 11:20     Bob4
2014-07-15 10:20     Bob5
2014-07-15 09:00     Bob6

session_log

event_time         | name | games_played | level_at_end |
---------------------------------------------------------
2014-07-16 11:40     Bob1             12               2
2014-07-16 10:00     Bob2              0               0
2014-07-16 09:20     Bob3            146               9
2014-07-15 11:20     Bob4             11               2
2014-07-15 10:20     Bob5              0               0
2014-07-15 09:00     Bob6              1               0

每次用户login..play..logout我都会将记录写入session_log。因此,每天可能有一个用户的许多条目。

用户可能会注册但不能登录。

我的系统中的级别从0开始。

我需要建立一个这样的报告:

day        | registrations | logged_in | played_users | lvl1 | lvl2 | lvl3 | lvl4 | lvl5 | lvl10
------------------------------------------------------------------------------------------------
2014-07-29              23          21             14     14     10      9      4      2     0
2014-07-28              18          17             15     14     11      9      3      1     1

其中:

  • day - 报告聚合日
  • registrations - 指定日期的注册数量
  • logged_in - 注册后24小时内登录的用户数
  • played_users - 注册后24小时内播放的用户数
  • lvl1 - 注册后24小时内达到1级的用户数
  • lvl2 - 注册后24小时内达到2级的用户数
  • lvl3 - 注册后24小时内达到3级的用户数
  • lvl4 - 注册后24小时内达到4级的用户数
  • lvl5 - 注册后24小时内达到5级的用户数
  • lvl10 - 注册后24小时内达到10级的用户数

所以,我写了一个这样的查询:

SELECT
  date(r.event_time)      AS day,
  count(DISTINCT r.name)  AS registrations,
  count(DISTINCT s1.name) AS logged_in,
  count(DISTINCT s2.name) AS played_users,
  count(DISTINCT sl_1.name) AS lvl1,
  count(DISTINCT sl_2.name) AS lvl2,
  count(DISTINCT sl_3.name) AS lvl3,
  count(DISTINCT sl_4.name) AS lvl4,
  count(DISTINCT sl_5.name) AS lvl5,
  count(DISTINCT sl_10.name) AS lvl10
FROM registrations AS r
  LEFT JOIN session_log AS s1
    ON r.name = s1.name
       AND s1.event_time >= r.event_time
       AND s1.event_time < date_trunc('day', r.event_time + INTERVAL '1 days')
  LEFT JOIN session_log as s2
    ON r.name = s2.name
    AND s2.event_time >= r.event_time
    AND s2.event_time < date_trunc('day', r.event_time + INTERVAL '1 days')
    AND s2.games_played > 0
  LEFT JOIN session_log as sl_1
    ON r.name = sl_1.name
    AND sl_1.event_time >= r.event_time
    AND sl_1.event_time < date_trunc('day', r.event_time + INTERVAL '1 days')
    AND sl_1.level_at_end > 0
  LEFT JOIN session_log as sl_2
    ON r.name = sl_2.name
    AND sl_2.event_time >= r.event_time
    AND sl_2.event_time < date_trunc('day', r.event_time + INTERVAL '1 days')
    AND sl_2.level_at_end > 1
  LEFT JOIN session_log as sl_3
    ON r.name = sl_3.name
    AND sl_3.event_time >= r.event_time
    AND sl_3.event_time < date_trunc('day', r.event_time + INTERVAL '1 days')
    AND sl_3.level_at_end > 2
  LEFT JOIN session_log as sl_4
    ON r.name = sl_4.name
    AND sl_4.event_time >= r.event_time
    AND sl_4.event_time < date_trunc('day', r.event_time + INTERVAL '1 days')
    AND sl_4.level_at_end > 3
  LEFT JOIN session_log as sl_5
    ON r.name = sl_5.name
    AND sl_5.event_time >= r.event_time
    AND sl_5.event_time < date_trunc('day', r.event_time + INTERVAL '1 days')
    AND sl_5.level_at_end > 4
  LEFT JOIN session_log as sl_10
    ON r.name = sl_10.name
    AND sl_10.event_time >= r.event_time
    AND sl_10.event_time < date_trunc('day', r.event_time + INTERVAL '1 days')
    AND sl_10.level_at_end > 9
WHERE r.event_time >= '2014-07-01'
      AND r.event_time < '2014-07-30'
GROUP BY day
ORDER BY day DESC;

它有效,但速度很慢。有没有办法加快这个查询?

2 个答案:

答案 0 :(得分:0)

因此,我得到的查询如下:

SELECT
  date(reg_time)  AS day,
  count(glid)     AS registrations,
  sum(CASE WHEN games_played IS NOT NULL THEN 1 ELSE 0 END) AS logged_in,
  sum(CASE WHEN games_played > 0 THEN 1 ELSE 0 END) AS played_users,
  sum(CASE WHEN level_at_end > 0 THEN 1 ELSE 0 END) AS lvl1,
  sum(CASE WHEN level_at_end > 1 THEN 1 ELSE 0 END) AS lvl2,
  sum(CASE WHEN level_at_end > 2 THEN 1 ELSE 0 END) AS lvl3,
  sum(CASE WHEN level_at_end > 3 THEN 1 ELSE 0 END) AS lvl4,
  sum(CASE WHEN level_at_end > 4 THEN 1 ELSE 0 END) AS lvl5,
  sum(CASE WHEN level_at_end > 9 THEN 1 ELSE 0 END) AS lvl10
FROM (SELECT
        max(r.event_time)   AS reg_time,
        r.glid,
        max(s.level_at_end) AS level_at_end,
        sum(s.games_played) AS games_played
      FROM registrations AS r
        LEFT JOIN session_log AS s
          ON r.glid = s.glid
             AND s.event_time >= r.event_time
             AND s.event_time < date_trunc('day', r.event_time + INTERVAL '1 day')
      GROUP BY r.glid
      ORDER BY reg_time DESC) AS foo
GROUP BY day
ORDER BY day DESC;

密钥是嵌套的select,并且left join简单地预先计算了所有需要的值。

答案 1 :(得分:-1)

如何使用CASE? F.e:

SELECT
  date(r.event_time)      AS day,
  count(DISTINCT r.name)  AS registrations,
  count(distinct CASE WHEN s.event_time < date_trunc('day', r.event_time + INTERVAL '1 days') THEN s.name ELSE null END) AS logged_in,
  count(distinct CASE WHEN s.event_time < date_trunc('day', r.event_time + INTERVAL '1 days') AND s.games_played > 0) THEN s.name ELSE null END) AS played_users
  count(distinct CASE WHEN s.event_time < date_trunc('day', r.event_time + INTERVAL '1 days') AND s.level_at_end > 0) THEN s.name ELSE null END) AS lvl1
  count(distinct CASE WHEN s.event_time < date_trunc('day', r.event_time + INTERVAL '1 days') AND s.level_at_end > 1) THEN s.name ELSE null END) AS lvl2
  count(distinct CASE WHEN s.event_time < date_trunc('day', r.event_time + INTERVAL '1 days') AND s.level_at_end > 2) THEN s.name ELSE null END) AS lvl3
  count(distinct CASE WHEN s.event_time < date_trunc('day', r.event_time + INTERVAL '1 days') AND s.level_at_end > 3) THEN s.name ELSE null END) AS lvl4
  count(distinct CASE WHEN s.event_time < date_trunc('day', r.event_time + INTERVAL '1 days') AND s.level_at_end > 4) THEN s.name ELSE null END) AS lvl5
  count(distinct CASE WHEN s.event_time < date_trunc('day', r.event_time + INTERVAL '1 days') AND s.level_at_end > 5) THEN s.name ELSE null END) AS lvl10
FROM registrations AS r
LEFT JOIN session_log AS s
    ON r.name = s.name
       AND s.event_time >= r.event_time
WHERE r.event_time >= '2014-07-01'
      AND r.event_time < '2014-07-30'
GROUP BY day
ORDER BY day DESC;