Question

我有一张叫做“地方”的桌子

origin | destiny | distance
---------------------------
A      | X       | 5
A      | Y       | 8
B      | X       | 12
B      | Y       | 9

对于每个起源，我想找出哪个是最接近的命运。在MySQL中，我可以做到

SELECT origin, destiny, MIN(distance) FROM places GROUP BY origin

我可以期待以下结果

origin | destiny | distance
---------------------------
A      | X       | 5
B      | y       | 9

不幸的是，这个查询在PostgreSQL中不起作用。 Postgre强迫我把“命运”放在他自己的聚合函数中，或者将它定义为GROUP BY语句的另一个参数。两种“解决方案”都完全改变了我想要的结果。

我怎样才能将上述MySQL查询翻译成PostgreSQL？

Answer 1

MySQL是唯一允许通过处理破解（在MySQL术语中“丢失”）组的DBMS。其他所有DBMS（包括Postgres）都会拒绝您的原始陈述。

在Postgres中，您可以使用distinct on运算符来实现相同的目标：

select distinct on (origin) 
       origin, 
       destiny, 
       distance
from places
order by origin, distance;

ANSI解决方案将是这样的：

select p.origin, 
       p.destiny, 
       p.distance
from places p
  join (select p2.origin, min(p2.distance) as distance
        from places  p2
        group by origin
) t on t.origin = p.origin and t.distance = p.distance
order by origin;

或者没有使用窗口函数的连接

select t.origin,
       t.destiny,
       t.distance
from (
    select origin, 
           destiny, 
           distance, 
           min(distance) over (partition by origin) as min_dist
    from places
) t 
where distance = min_dist
order by origin;

或其他具有窗口功能的解决方案：

select distinct origin,
       first_value(destiny) over (partition by origin order by distance) as destiny, 
       min(distance) over (partition by origin) as distance
from places
order by origin;

我的猜测是第一个（Postgres特定的）可能是最快的一个。

以下是所有三种解决方案的SQLFiddle：http://sqlfiddle.com/#!12/68308/2

请注意，MySQL结果实际上可能不正确，因为它将为命运返回任意（=随机）值。 MySQL返回的值可能不是属于最低距离的值。

可以在此处找到有关通过MySQL处理的损坏组的更多详细信息：http://www.mysqlperformanceblog.com/2006/09/06/wrong-group-by-makes-your-queries-fragile/

Answer 2

在PostgreSQL中最好的（在我看来）这样做的方法是使用一个聚合函数，它明确指定应该选择destiny的值。

如果您按照destiny订购匹配的行，则可以将所需的值描述为“第一个匹配的distance。

因此，您需要两件事：

A "first" aggregate，它只返回值列表的“第一个”。这很容易定义，但不作为标准包含在内。
能够指定这些匹配的顺序（否则，就像MySQL“松散的分组依据”一样，它将是未定义的实际值。）这是在PostgreSQL 9.0和the syntax is documented under "Aggregate Expressions"。

定义first()聚合后（每个数据库只需要执行一次，在设置初始表时），然后可以写：

Select
       origin, 
       first(destiny Order by distance Asc) as closest_destiny, 
       min(distance) as closest_destiny_distance
       -- Or, equivalently: first(distance Order by distance Asc) as closest_destiny_distance
from places
group by origin
order by origin;

Here is a SQLFiddle demo显示整个操作正在进行中。

Answer 3

只是为a_horse_with_no_name回答添加另一个可能的解决方案 - 使用窗口函数row_num：

with cte as (
    select
        row_number() over(partition by origin order by distance) as row_num,
        *
    from places
)
select
    origin, 
    destiny, 
    distance    
from cte
where row_num = 1

它也适用于SQL Server或其他支持row_number的RDBMS。但是在PostgreSQL中，我更喜欢distinct on语法。

sql fiddle demo

PostgreSQL - 获取聚合列的相关列

3 个答案: