我正在寻找从PostgreSQL中的GROUP BY
中选择第一项的方法,直到找到此stackoverflow:Select first row in each GROUP BY group?
在那里,我看到使用了WITH
命令。
我试图了解更多" advanced" SQL的命令,如PARTITION
,WITH
,ROW_NUMBER
等。直到两三个月前,我才知道基本命令(SELECT
,INNER JOIN
, LEFT JOIN
,ORDER BY
,GROUP BY
等);
我有一点问题(已解决,但我不知道这是否是更好的方法*)。
*更好的方式=我更关心干净的SQL代码而不是性能 - 这仅适用于每天执行一次且不超过5000条记录的报告。
我在PostgreSQL中有两个表:
+----------------------------------------------+
| TABLE NAME: point |
+--------+---------------+----------+----------+
| km | globalid | lat | long |
+--------+---------------+----------+----------+
| 36600 | 1553E2AB-B2F8 | -1774.44 | -5423.58 |
| 364000 | 25EB2465-1B8A | -1773.42 | -5422.03 |
| 362000 | 5FFDE611-88DF | -1771.80 | -5420.37 |
+--------+---------------+----------+----------+
+---------------------------------------------------------+
| TABLE NAME: photo |
+--------------+---------------+------------+-------------+
| attachmentid | rel_globalid | date | filename |
+--------------+---------------+------------+-------------+
| 1 | 1553E2AB-B2F8 | 2015-02-24 | photo01.jpg |
| 2 | 1553E2AB-B2F8 | 2015-02-24 | photo02.jpg |
| 405 | 25EB2465-1B8A | 2015-02-12 | photo03.jpg |
| 406 | 25EB2465-1B8A | 2015-02-12 | photo04.jpg |
| 407 | 25EB2465-1B8A | 2015-02-13 | photo06.jpg |
| 3 | 5FFDE611-88DF | 2015-02-12 | photo07.jpg |
+--------------+---------------+------------+-------------+
所以,对于这个问题:
每个point
都有一张或多张照片,但我只需要point
数据,第一和最后< / strong> photo
。如果point
只有一个photo
,我只需要第一个photo
。如果point
有三个photos
,我只需要第一个和第三个photo
。
所以,我是如何解决的:
首先,我需要每个photo
的第一个point
,因此,我按rel_globalid
分组,并按照组对每张照片进行编号:
WITH photos_numbered AS (
SELECT
rel_globalid,
date,
filename,
ROW_NUMBER()
OVER (
PARTITION BY rel_globalid
ORDER BY date
) AS photo_num
FROM
photo
)
使用此代码,我也可以获得第2个,第3个等等。
好的,现在,我想拍第一张照片(仍然使用上面的WITH
):
SELECT *
FROM
photos_numbered
WHERE
photo_num = 1
为了获得最后一张照片,我使用了以下SQL:
SELECT
p1.*
FROM
photos_numbered p1
JOIN (
SELECT
rel_globalid,
max(photo_num) photo_num
FROM
photos_numbered
GROUP BY
rel_globalid
) p2
ON
p1.rel_globalid = p2.rel_globalid AND
p1.photo_num = p2.photo_num
WHERE
p1.photo_num > 1
WHERE p1.photo_num > 1
是因为如果point
只有一个photo
,则此photo
将显示为第一张照片,最后一张照片将显示为NULL
。< / p>
好的,现在我必须&#34;转换&#34;第一个SELECT
的{{1}}和photo
的最后一个photo
,并为第一个WITH
做一个简单SELECT
{最后INNER JOIN
代表{1}}和photo
:
LEFT JOIN
我认为这个SQL对于一个简单的事情来说是巨大的!
有效吗?是的,但是我想要一些建议,一些我可以更好地阅读和理解的文档,一些可以用来制作一个更好的&#34; SQL(就像我说的那样,大约两三个月前我甚至不知道photo
和WITH photos_numbered AS (
SELECT
rel_globalid,
date,
filename,
ROW_NUMBER()
OVER (
PARTITION BY rel_globalid
ORDER BY date
) AS photo_num
FROM
photo
), first_photo AS (
SELECT *
FROM
photos_numbered
WHERE
photo_num = 1
), last_photo AS (
SELECT p1.*
FROM
photos_numbered p1
JOIN (
SELECT
rel_globalid,
max(photo_num) photo_num
FROM
photos_numbered
GROUP BY
rel_globalid
) p2
ON p1.rel_globalid = p2.rel_globalid AND
p1.photo_num = p2.photo_num
WHERE
p1.photo_num > 1
)
SELECT DISTINCT
point.km,
point.globalid,
point.lat,
point."long",
first_photo.date AS fp_date,
first_photo.filename AS fp_filename,
last_photo.date AS lp_date,
last_photo.filename AS lp_filename
FROM
point
INNER JOIN
first_photo
ON
first_photo.rel_globalid = point.globalid
LEFT JOIN
last_photo
ON
last_photo.rel_globalid = point.globalid
ORDER BY
km
命令。
我试图在这里为SQLFiddle添加一个链接,但SQLFiddle从来没有为我工作(总是返回&#39; oops&#39;消息)。
答案 0 :(得分:2)
如果您正在寻找干净的SQL,那么尝试使用lateral_ left和first_value以及last_value窗口函数,而不是公共表表达式(WITH子句):
select *
from point po
left join lateral
(
select first_value( date ) over( order by ph.date) as first_photo_date,
first_value( filename ) over( order by ph.date) as first_photo_filename,
last_value( date ) over( order by ph.date) as last_photo_date,
last_value( filename ) over( order by ph.date) as last_photo_filename
from photo ph
where po.globalid = ph.rel_globalid
limit 1
) q on true
;
当只有一条记录时,带有案例表达式的附加count(*) over()
可用于“清理”上一张照片的值:
select *
from point po
left join lateral
(
select first_value( date ) over( order by ph.date) as first_photo_date,
first_value( filename ) over( order by ph.date) as first_photo_filename,
case when count(*) over () > 1
then last_value( date ) over( order by ph.date)
end as last_photo_date,
case when count(*) over () > 1
then last_value( filename ) over( order by ph.date)
end as last_photo_filename
from photo ph
where po.globalid = ph.rel_globalid
limit 1
) q on true
;
答案 1 :(得分:0)
使用krokodilko的答案,我创建了一个没有LEFT JOIN LATERAL
的新SQL查询,因为我使用的是PostgreSQL 9.2(没有LEFT JOIN LATERAL
)。
SELECT DISTINCT
po.km,
po.globalid,
po.lat,
po."long",
ph.fp_date,
ph.fp_filename,
ph.lp_date,
ph.lp_filename
FROM
point po
INNER JOIN
(
SELECT DISTINCT
rel_globalid,
first_value(date) OVER (PARTITION BY ph.rel_globalid) AS fp_date,
first_value(filename) OVER (PARTITION BY ph.rel_globalid) AS fp_filename,
CASE WHEN count(*) OVER (PARTITION BY ph.rel_globalid) > 1 THEN
last_value(date) OVER (PARTITION BY ph.rel_globalid)
END AS lp_date,
CASE WHEN count(*) OVER (PARTITION BY ph.rel_globalid) > 1 THEN
last_value(filename) OVER (PARTITION BY ph.rel_globalid)
END AS lp_filename
FROM
photo ph
ORDER BY
rel_globalid
) ph
ON ph.rel_globalid = po.globalid
OVER (PARTITION)
field
中只有我不喜欢的INNER JOIN