我对sql相对较新,并且一直试图让一个相当复杂的查询(对我而言)工作一段时间,但我一直在使用节点postgres在每列中获取重复值。通过此查询,我试图在仪表板上跟踪用户信息,计划信息和电子邮件信息。在我们开始查询之前,这里是表格 -
USER TABLE(u) - 跟踪用户信息
+----+-------+---------+-------------+----------+
| id | first | last | email | password |
+----+-------+---------+-------------+----------+
| 1 | joe | smith | j@gmail.com | 1234 |
| 2 | mary | johnson | m@gmail.com | 3445 |
| 3 | harry | gold | h@gmail.com | 4345 |
+----+-------+---------+-------------+----------+
PLANS TABLE(p) - 用户可以与朋友制定的计划
+----+--------------+-----------+------------+------+--------+-----------+---------+------+
| id | experienceid | hostid(u) | guestid(u) | date | time | paidid(u) | groupid | newp |
+----+--------------+-----------+------------+------+--------+-----------+---------+------+
| 33 | 1 | 1 | [1,2,3] | 4/20 | 8:00pm | [1,2] | 1 | true |
+----+--------------+-----------+------------+------+--------+-----------+---------+------+
电子邮件表(e) - 根据他们所属的计划跟踪我发送给用户的邮件
+-------------+-----------+---------+----------+
| email(u) | planid(p) | confirm | reminder |
+-------------+-----------+---------+----------+
| j@gmail.com | 33 | null | null |
| m@gmail.com | 33 | true | false |
| h@gmail.com | 33 | true | false |
+-------------+-----------+---------+----------+
现在查询我要做的是组合所有三个表来获得此输出 -
+-------+---------------+---------------------------+---------+---------+------------+---------------+---------------+
| id(p) | hostname(u+p) | paidguests(u+p) | time(p) | newp(p) | groupid(p) | reminder(e) | confirm(e) |
+-------+---------------+---------------------------+---------+---------+------------+---------------+---------------+
| 33 | joe smith | [joe smith, mary johnson] | 8:00pm | true | 1 | [true, false] | [true, false] |
+-------+---------------+---------------------------+---------+---------+------------+---------------+---------------+
现在我在查询中停下来了,我几乎让它工作了,但我不断得到重复的值,它看起来像这样 -
+-------+---------------+----------------------------------------------------+---------+---------+------------+----------------------------+---------------------------+
| id(p) | hostname(u+p) | paidguests(u+p) | time(p) | newp(p) | groupid(p) | reminder(e) | confirm(e) |
+-------+---------------+----------------------------------------------------+---------+---------+------------+----------------------------+---------------------------+
| 33 | joe smith | [joe smith, mary johnson, joe smith, mary johnson] | 8:00pm | true | 1 | [true, false, true, false] | [true, false, true false] |
+-------+---------------+----------------------------------------------------+---------+---------+------------+----------------------------+---------------------------+
现在我并不关心相对于paidguests(u + p)列的确认和提醒列的顺序,只要正确的数据在那里并且没有重复。这是我现在的查询 -
SELECT p.id,
Concat(u.first, ' ', u.last) AS hostname,
Array_agg(Concat(us.first, ' ', us.last)) AS paidguests,
p.time,
p.groupid,
p.newp,
Array_agg(e.confirm) AS confirm,
Array_agg(e.reminder) AS reminder
FROM plans p
CROSS JOIN Unnest(p.paidid) AS allguests
LEFT JOIN users us
ON allguests = us.id
LEFT JOIN emails e
ON p.id = e.planid
LEFT JOIN users u
ON p.hostid = u.id
WHERE p.experienceid = $1
AND p.date = $2
GROUP BY p.id,
u.first,
u.last,
p.paidid,
p.time,
p.groupid,
p.newp,
confirm,
reminder
ORDER BY Array_length(p.paidid, 1) DESC
所以基本上只是想让表格正确而没有重复。在我将联接添加到电子邮件表之前它正在工作,但不完全确定为什么要复制它。
希望我在解释中彻底。如果没有,请告诉我我能澄清的内容!非常感谢:)。
答案 0 :(得分:1)
尝试在where子句中添加此条件:
AND us.email = e.email
这里的罪魁祸首是 - >由于所有用户和所有电子邮件都具有相同的plan_id
,因此所有用户都可以加入所有电子邮件,无论其电子邮件ID如何。因此重复。
答案 1 :(得分:0)
Rahul spottet the missing join condition.但兔子洞更深了。我建议这个查询:
SELECT p.id
, concat_ws(' ', u.first, u.last) AS hostname -- concat_ws!
, p.time
, p.groupid
, p.newp
, paid.paidguests
, paid.confirm
, paid.reminder
FROM plans p
LEFT JOIN users u ON u.id = p.hostid
LEFT JOIN LATERAL ( -- LATERAL join
SELECT array_agg(sub.paidguest) AS paidguests
, array_agg(sub.confirm) AS confirm
, array_agg(sub.reminder) AS reminder
FROM (
SELECT concat_ws(' ', us.first, us.last) AS paidguest, e.confirm, e.reminder
FROM unnest(p.paidid) WITH ORDINALITY AS paid(id, ord)
JOIN users us ON us.id = paid.id
LEFT JOIN emails e ON e.email = us.email
AND e.planid = p.planid
ORDER BY paid.ord
) sub
) paid ON true
WHERE p.experienceid = $1
AND p.date = $2
-- no GROUP BY needed
ORDER BY cardinality(p.paidid) DESC, p.id;
假设(planid, email)
是PRIMARY KEY
表格的email
,并且FOREIGN KEY
到email
有plan.email
个约束。
首先聚合,然后加入,这样您就不需要对所有不需要聚合的列进行GROUP BY。在检索所有或大多数行时,其他查询技术通常更快,对于像您的示例中的小选择,我建议LATERAL
加入。相关:
在这种特殊情况下,JOIN LATERAL
等同于LEFT JOIN LATERAL
,因为带有聚合的子句总是返回1行。
原始Unnest(p.paidid) AS allguests
中的别名令人困惑,因为这些似乎是付费的客人的ID,而不是所有来宾的ID
如果concat_ws()
或first
可以为NULL,请使用last
。参见:
当取消数组时,元素的顺序通常保留在简单情况中。但是您有其他联接,因此您应该使用WITH ORDINALITY
和显式ORDER BY
来避免意外。您的查询似乎可以正常工作,即使是长时间的问题 - 然后"突然"如果你不明白这一点,似乎会破坏(元素的错误顺序)。
您的整个数据库设计值得商榷。通常,数组是设计的反模式,而应该作为相关表实现 - 出于多种原因,超出了本问题的范围。