我试图用我的查询来完成某些事情,但它并没有真正起作用。我的应用程序曾经有一个mongo db,所以应用程序用于在一个字段中获取数组,现在我们不得不改为Postgres,我不想更改我的应用程序代码以保持v1工作。
为了在Postgres中的1个字段中获取数组,我使用了array_agg()
函数。到目前为止这个工作正常。但是,我需要另一个不同的表中的字段中的另一个数组。
例如:
我有我的员工。员工有多个地址,有多个工作日。
SELECT name, age, array_agg(ad.street) FROM employees e
JOIN address ad ON e.id = ad.employeeid
GROUP BY name, age
现在这对我来说很好,这会导致例如:
| name | age| array_agg(ad.street)
| peter | 25 | {1st street, 2nd street}|
现在我想在工作日加入另一张桌子,所以我这样做:
SELECT name, age, array_agg(ad.street), arrag_agg(wd.day) FROM employees e
JOIN address ad ON e.id = ad.employeeid
JOIN workingdays wd ON e.id = wd.employeeid
GROUP BY name, age
这导致:
| peter | 25 | {1st street, 1st street, 1st street, 1st street, 1st street, 2nd street, 2nd street, 2nd street, 2nd street, 2nd street}| "{Monday,Tuesday,Wednesday,Thursday,Friday,Monday,Tuesday,Wednesday,Thursday,Friday}
但我需要结果:
| peter | 25 | {1st street, 2nd street}| {Monday,Tuesday,Wednesday,Thursday,Friday}
我知道它与我的连接有关,因为多个连接行多次但我不知道如何实现这一点,任何人都可以给我正确的提示吗?
答案 0 :(得分:9)
DISTINCT
通常用于修复从内部腐烂的查询,这通常很慢和/或不正确。不要将行数乘以开头,然后您不必在最后对不需要的重复项进行排序。
一次加入多个n表(“有很多”)会使结果集中的行相乘。这就像代理商的CROSS JOIN
或Cartesian product :
有多种方法可以避免这种错误。
从技术上讲,只要您在聚合之前一次加入多行 一个 表,该查询就会起作用:
SELECT e.id, e.name, e.age, e.streets, arrag_agg(wd.day) AS days
FROM (
SELECT e.id, e.name, e.age, array_agg(ad.street) AS streets
FROM employees e
JOIN address ad ON ad.employeeid = e.id
GROUP BY e.id -- id enough if it is defined PK
) e
JOIN workingdays wd ON wd.employeeid = e.id
GROUP BY e.id, e.name, e.age;
最好还包括主键id
和GROUP BY
,因为name
和age
不一定是唯一的。你可能错误地合并了两名员工。
但是你可以在加入之前在子查询中进行聚合,除非你在WHERE
上有选择性的employees
条件,否则这是优越的:
SELECT e.id, e.name, e.age, ad.streets, arrag_agg(wd.day) AS days
FROM employees e
JOIN (
SELECT employeeid, array_agg(ad.street) AS streets
FROM address
GROUP BY 1
) ad ON ad.employeeid = e.id
JOIN workingdays wd ON e.id = wd.employeeid
GROUP BY e.id, e.name, e.age, ad.streets;
或聚合两者:
SELECT name, age, ad.streets, wd.days
FROM employees e
JOIN (
SELECT employeeid, array_agg(ad.street) AS streets
FROM address
GROUP BY 1
) ad ON ad.employeeid = e.id
JOIN (
SELECT employeeid, arrag_agg(wd.day) AS days
FROM workingdays
GROUP BY 1
) wd ON wd.employeeid = e.id;
如果您检索基表中的所有或大多数行,则最后一个通常更快。
请注意,使用JOIN
而非LEFT JOIN
会从结果中删除没有地址或没有工作日的员工。这可能是也可能不是。切换到LEFT JOIN
以在结果中保留 所有 员工。
对于小选择,我会考虑相关的子查询:
SELECT name, age
, (SELECT array_agg(street) FROM address WHERE employeeid = e.id) AS streets
, (SELECT arrag_agg(day) FROM workingdays WHERE employeeid = e.id) AS days
FROM employees e
WHERE e.namer = 'peter'; -- very selective
或者,使用Postgres 9.3或更高版本,您可以使用LATERAL
联接:
SELECT e.name, e.age, a.streets, w.days
FROM employees e
LEFT JOIN LATERAL (
SELECT array_agg(street) AS streets
FROM address
WHERE employeeid = e.id
GROUP BY 1
) a ON true
LEFT JOIN LATERAL (
SELECT array_agg(day) AS days
FROM workingdays
WHERE employeeid = e.id
GROUP BY 1
) w ON true
WHERE e.name = 'peter'; -- very selective
任一查询都会在结果中保留 所有 员工。
答案 1 :(得分:1)
每当您需要不重复的值时,请使用DISTINCT,如下所示:
SELECT name, age, array_agg(DISTINCT ad.street), array_agg(DISTINCT wd.day) FROM employees e
JOIN address ad ON e.id = ad.employeeid
JOIN workingdays wd ON e.id = wd.employeeid
GROUP BY name, age