在Postgres中联接同一表的路径不止一个

时间:2019-06-13 05:32:54

标签: sql postgresql join left-join

我继承了一些要清理的表,但首先,我试图连接所需的所有内容,但遇到了问题,因为通过SpecialEvents到{ {1}}。

在某些情况下,EventRegistrations可以使用EventRegistrations直接联接,而在其他情况下,必须先联接另一个表event_registrations.scoreable_id,您可以知道通过{{1 }},即SpecialPlacesevent_registrations.scoreable_type

基本上,如果我还必须先加入SpecialEvent,该如何加入SpecialPlace?例如,如果我尝试以两种不同的方式加入SpecialEvents,则会收到错误消息:“表名“ special_events”指定了多次”。

SpecialPlaces

SpecialEvent

SpecialEvents

特殊地方

SELECT event_registrations.id, array_agg(teams.name), event_registrations.number_of_players, event_registrations.state, event_registrations.created_at, array_agg(players.email), array_agg(special_events.name), array_agg(special_places.id)
FROM event_registrations
LEFT JOIN teams ON event_registrations.team_id = teams.id
LEFT JOIN team_memberships ON teams.id = team_memberships.team_id
LEFT JOIN players ON team_memberships.player_id = players.id
LEFT JOIN special_events ON event_registrations.scoreable_id = special_events.id AND event_registrations.scoreable_type = 'SpecialEvent'
LEFT JOIN special_places ON event_registrations.scoreable_id = special_places.id AND event_registrations.scoreable_type = 'SpecialPlace'
GROUP BY event_registrations.id, event_registrations.number_of_players, event_registrations.state, event_registrations.created_at

事件注册

+----+-----------+---------------------------+-----------+---------------------------+
| id | region_id | start_at                  | state     | created_at                |
+----+-----------+---------------------------+-----------+---------------------------+
| 2  | 1         | 2015-10-22 19:30:00 +0100 | published | 2015-09-21 09:41:05 +0100 |
| 4  | 1         | 2016-01-21 19:30:00 +0000 | published | 2015-11-26 15:11:25 +0000 |
| 3  | 1         | 2016-01-28 19:30:00 +0000 | published | 2015-11-23 16:16:27 +0000 |
| 5  | 1         | 2016-12-31 19:30:00 +0000 | draft     | 2016-02-24 15:17:22 +0000 |
| 6  | 1         | 2016-05-16 19:30:00 +0100 | published | 2016-03-29 14:33:40 +0100 |
| 10 | 1         | 2016-09-12 19:30:00 +0100 | published | 2016-06-28 17:18:54 +0100 |
| 8  | 1         | 2016-10-07 19:30:00 +0100 | draft     | 2016-06-09 15:03:36 +0100 |
| 7  | 1         | 2016-05-23 19:30:00 +0100 | published | 2016-03-30 19:30:21 +0100 |
| 9  | 1         | 2016-08-04 19:30:00 +0100 | published | 2016-06-09 15:18:56 +0100 |
| 11 | 1         | 2016-11-07 19:30:00 +0000 | draft     | 2016-07-11 17:20:11 +0100 |
+----+-----------+---------------------------+-----------+---------------------------+

enter image description here

3 个答案:

答案 0 :(得分:6)

我的同事想说的是你想做的方式是不可行的,但是,做同一件事的方法有很多。

要避免两次联接,您将要做的是创建一个包含SpecialEvents和SpecialPlaces的组合表,其中包含您想要的所有信息,然后进行联接。

例如这样的东西:

SELECT event_registrations.id, array_agg(teams.name), event_registrations.number_of_players, event_registrations.state, event_registrations.created_at, array_agg(players.email), array_agg(special_events.name), array_agg(special_places.id)
FROM event_registrations
LEFT JOIN teams ON event_registrations.team_id = teams.id
LEFT JOIN team_memberships ON teams.id = team_memberships.team_id
LEFT JOIN players ON team_memberships.player_id = players.id
LEFT JOIN special_places ON event_registrations.scoreable_id = special_places.id AND event_registrations.scoreable_type = 'SpecialPlace'
LEFT JOIN (
SELECT special_events.id AS special_event_id, special_places.id AS special_place_id, special_events.name
FROM special_places
LEFT JOIN special_events ON special_places.special_event_id = special_events.id
UNION
SELECT special_events.id AS special_event_id, null AS special_place_id, special_events.name
FROM special_events
) el1
ON (event_registrations.scoreable_id = el1.special_place_id AND event_registrations.scoreable_type = 'SpecialPlace') OR (event_registrations.scoreable_id = el1.special_event_id AND event_registrations.scoreable_type = 'SpecialEvent')
GROUP BY event_registrations.id, event_registrations.number_of_players, event_registrations.state, event_registrations.created_at

答案 1 :(得分:4)

假设,并且根据一些有根据的猜测,id是每个给定表中的PRIMARY KEY列:

SELECT er.id
     , t.name  AS team_name            -- can only be 1, no array_agg
     , er.number_of_players
     , er.state
     , er.created_at
     , tp.player_emails                -- pre-aggregated!
     , se.name AS special_event_name   -- can only be 1, no array_agg
     , sp.id   AS special_pace_id      -- can only be 1, no array_agg
FROM   event_registrations   er
LEFT   JOIN teams t ON t.id = er.team_id
LEFT   JOIN (
   SELECT tm.team_id, array_agg(p.email) AS player_emails
   FROM   team_memberships tm
   JOIN   players          p  ON p.id = tm.player_id
   GROUP  BY 1
   ) tp USING (team_id)
LEFT   JOIN special_places sp ON sp.id = er.scoreable_id AND er.scoreable_type = 'SpecialPlace'
LEFT   JOIN special_events se ON se.id = er.scoreable_id AND er.scoreable_type = 'SpecialEvent'
                              OR se.id = sp.special_event_id AND er.scoreable_type = 'SpecialPlace'

很多 更简单,更快。

要点

  • 如果您确实确实需要两次连接到同一张表,则必须使用表别名,例如:

    FROM   event_registrations er

    它的缩写:

    FROM   event_registrations AS er

    结果是,您不需要需要两次加入同一张表。仍然使用表别名来降低噪音。相关:

  • 外部GROUP BY中全局SELECT的唯一可识别原因是对team_memberships的联接可能会增加行数。我将player_emails的聚合移到了便宜得多的子查询中,删除了外部GROUP BY并简化了其余的查询。还应该大大加快。相关:

  • 如果 ,您需要在外部查询中使用GROUP BY-并且event_registrations.id的确是PRIMARY KEY-然后, :

    GROUP  BY er.id, er.number_of_players, er.state, er.created_at
    

    ...只是另一种嘈杂的说法:

    GROUP  BY er.id
    

    自Postgres 9.1起,PK覆盖GROUP BY子句中表的所有列。参见:

    但是您根本不需要。

  • 最后,通过首先有条件地加入special_places,然后有条件地再次加入special_events来解决核心问题。缺少的列用NULL值填充:

    LEFT   JOIN special_places sp ON sp.id = er.scoreable_id AND er.scoreable_type = 'SpecialPlace'
    LEFT   JOIN special_events se ON se.id = er.scoreable_id AND er.scoreable_type = 'SpecialEvent'
                                  OR se.id = sp.special_event_id AND er.scoreable_type = 'SpecialPlace'
    

    严格来说,最后一个AND er.scoreable_type = 'SpecialPlace'是多余的,因为否则就没有sp.special_event_id了。为了清楚起见,我保留了它。

答案 2 :(得分:0)

从数学上来说,顺序不会影响您的结果(它会影响效率)。

已经说过,许多RDBMS实现(Postgres)具有选择成本最低的连接顺序的功能。

如果您要强制执行特定的加入顺序(即使它给出的答案相同),也可以尝试使用方括号。即使这样,我也不确定查询优化器是否不会重写查询树来优化性能-更改连接顺序。