在同一个表上加入两次而另一个表加入时会产生重复(总共3个)

时间:2018-04-21 05:16:56

标签: sql postgresql

我对sql相对较新,并且一直试图让一个相当复杂的查询(对我而言)工作一段时间,但我一直在使用节点postgres在每列中获取重复值。通过此查询,我试图在仪表板上跟踪用户信息,计划信息和电子邮件信息。在我们开始查询之前,这里是表格 -

USER TABLE(u) - 跟踪用户信息

+----+-------+---------+-------------+----------+
| id | first |  last   |    email    | password | 
+----+-------+---------+-------------+----------+
|  1 | joe   | smith   | j@gmail.com |     1234 | 
|  2 | mary  | johnson | m@gmail.com |     3445 | 
|  3 | harry | gold    | h@gmail.com |     4345 | 
+----+-------+---------+-------------+----------+

PLANS TABLE(p) - 用户可以与朋友制定的计划

+----+--------------+-----------+------------+------+--------+-----------+---------+------+
| id | experienceid | hostid(u) | guestid(u) | date |  time  | paidid(u) | groupid | newp |
+----+--------------+-----------+------------+------+--------+-----------+---------+------+
| 33 |            1 |         1 | [1,2,3]    | 4/20 | 8:00pm | [1,2]     |       1 | true |
+----+--------------+-----------+------------+------+--------+-----------+---------+------+

电子邮件表(e) - 根据他们所属的计划跟踪我发送给用户的邮件

+-------------+-----------+---------+----------+
|  email(u)   | planid(p) | confirm | reminder |
+-------------+-----------+---------+----------+
| j@gmail.com |        33 | null    | null     |
| m@gmail.com |        33 | true    | false    |
| h@gmail.com |        33 | true    | false    |
+-------------+-----------+---------+----------+

现在查询我要做的是组合所有三个表来获得此输出 -

+-------+---------------+---------------------------+---------+---------+------------+---------------+---------------+
| id(p) | hostname(u+p) |      paidguests(u+p)      | time(p) | newp(p) | groupid(p) |  reminder(e)  |  confirm(e)   |
+-------+---------------+---------------------------+---------+---------+------------+---------------+---------------+
|    33 | joe smith     | [joe smith, mary johnson] | 8:00pm  | true    |          1 | [true, false] | [true, false] |
+-------+---------------+---------------------------+---------+---------+------------+---------------+---------------+

现在我在查询中停下来了,我几乎让它工作了,但我不断得到重复的值,它看起来像这样 -

+-------+---------------+----------------------------------------------------+---------+---------+------------+----------------------------+---------------------------+
| id(p) | hostname(u+p) |                  paidguests(u+p)                   | time(p) | newp(p) | groupid(p) |        reminder(e)         |        confirm(e)         |
+-------+---------------+----------------------------------------------------+---------+---------+------------+----------------------------+---------------------------+
|    33 | joe smith     | [joe smith, mary johnson, joe smith, mary johnson] | 8:00pm  | true    |          1 | [true, false, true, false] | [true, false, true false] |
+-------+---------------+----------------------------------------------------+---------+---------+------------+----------------------------+---------------------------+ 

现在我并不关心相对于paidguests(u + p)列的确认和提醒列的顺序,只要正确的数据在那里并且没有重复。这是我现在的查询 -

SELECT p.id, 
       Concat(u.first, ' ', u.last)              AS hostname, 
       Array_agg(Concat(us.first, ' ', us.last)) AS paidguests, 
       p.time, 
       p.groupid, 
       p.newp, 
       Array_agg(e.confirm)                      AS confirm, 
       Array_agg(e.reminder)                     AS reminder 
FROM   plans p 
       CROSS JOIN Unnest(p.paidid) AS allguests 
       LEFT JOIN users us 
              ON allguests = us.id 
       LEFT JOIN emails e 
              ON p.id = e.planid 
       LEFT JOIN users u 
              ON p.hostid = u.id 
WHERE  p.experienceid = $1 
       AND p.date = $2 
GROUP  BY p.id, 
          u.first, 
          u.last, 
          p.paidid, 
          p.time, 
          p.groupid, 
          p.newp, 
          confirm, 
          reminder 
ORDER  BY Array_length(p.paidid, 1) DESC 

所以基本上只是想让表格正确而没有重复。在我将联接添加到电子邮件表之前它正在工作,但不完全确定为什么要复制它。

希望我在解释中彻底。如果没有,请告诉我我能澄清的内容!非常感谢:)。

2 个答案:

答案 0 :(得分:1)

尝试在where子句中添加此条件:

AND us.email = e.email

这里的罪魁祸首是 - >由于所有用户和所有电子邮件都具有相同的plan_id,因此所有用户都可以加入所有电子邮件,无论其电子邮件ID如何。因此重复。

答案 1 :(得分:0)

Rahul spottet the missing join condition.但兔子洞更深了。我建议这个查询:

SELECT p.id
     , concat_ws(' ', u.first, u.last) AS hostname  -- concat_ws!
     , p.time
     , p.groupid
     , p.newp
     , paid.paidguests
     , paid.confirm
     , paid.reminder
FROM   plans      p
LEFT   JOIN users u  ON u.id = p.hostid
LEFT   JOIN LATERAL (         -- LATERAL join
   SELECT array_agg(sub.paidguest) AS paidguests
        , array_agg(sub.confirm)   AS confirm
        , array_agg(sub.reminder)  AS reminder
   FROM  (
      SELECT concat_ws(' ', us.first, us.last) AS paidguest, e.confirm, e.reminder
      FROM   unnest(p.paidid) WITH ORDINALITY AS paid(id, ord)
      JOIN   users       us ON us.id = paid.id
      LEFT   JOIN emails e  ON e.email = us.email
                           AND e.planid = p.planid
      ORDER  BY paid.ord
      ) sub
   ) paid ON true
WHERE  p.experienceid = $1
AND    p.date = $2
-- no GROUP  BY needed
ORDER  BY cardinality(p.paidid) DESC, p.id;

假设(planid, email)PRIMARY KEY表格的email,并且FOREIGN KEYemailplan.email个约束。

重点

  • 首先聚合,然后加入,这样您就不需要对所有不需要聚合的列进行GROUP BY。在检索所有或大多数行时,其他查询技术通常更快,对于像您的示例中的小选择,我建议LATERAL加入。相关:

    在这种特殊情况下,JOIN LATERAL等同于LEFT JOIN LATERAL,因为带有聚合的子句总是返回1行。

  • 原始Unnest(p.paidid) AS allguests中的别名令人困惑,因为这些似乎是付费的客人的ID,而不是所有来宾的ID

  • 如果concat_ws()first可以为NULL,请使用last。参见:

  • 当取消数组时,元素的顺序通常保留在简单情况中。但是您有其他联接,因此您应该使用WITH ORDINALITY和显式ORDER BY来避免意外。您的查询似乎可以正常工作,即使是长时间的问题 - 然后"突然"如果你不明白这一点,似乎会破坏(元素的错误顺序)。

您的整个数据库设计值得商榷。通常,数组是设计的反模式,而应该作为相关表实现 - 出于多种原因,超出了本问题的范围。