选择按某些列排序的行和不同的列

时间:2012-03-20 22:38:22

标签: postgresql greatest-n-per-group distinct-on

与 - PostgreSQL DISTINCT ON with different ORDER BY

相关

我购买了桌子(product_id,purchase_at,address_id)

示例数据:

| id | product_id |   purchased_at    | address_id |
| 1  |     2      | 20 Mar 2012 21:01 |     1      |
| 2  |     2      | 20 Mar 2012 21:33 |     1      |
| 3  |     2      | 20 Mar 2012 21:39 |     2      |
| 4  |     2      | 20 Mar 2012 21:48 |     2      |

我期望的结果是每个address_id最近购买的产品(完整行),结果必须按purchase_at字段的后续顺序排序:

| id | product_id |   purchased_at    | address_id |
| 4  |     2      | 20 Mar 2012 21:48 |     2      |
| 2  |     2      | 20 Mar 2012 21:33 |     1      |

使用查询:

SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
FROM "purchases"
WHERE "purchases"."product_id" = 2
ORDER BY purchases.address_id ASC, purchases.purchased_at DESC

我得到了:

| id | product_id |   purchased_at    | address_id |
| 2  |     2      | 20 Mar 2012 21:33 |     1      |
| 4  |     2      | 20 Mar 2012 21:48 |     2      |

所以行是相同的,但顺序是错误的。有什么办法解决吗?

3 个答案:

答案 0 :(得分:16)

一个明确的问题:)

SELECT t1.* FROM purchases t1
LEFT JOIN purchases t2
ON t1.address_id = t2.address_id AND t1.purchased_at < t2.purchased_at
WHERE t2.purchased_at IS NULL
ORDER BY t1.purchased_at DESC

而且很可能是一种更快的方法:

SELECT t1.* FROM purchases t1
JOIN (
    SELECT address_id, max(purchased_at) max_purchased_at
    FROM purchases
    GROUP BY address_id
) t2
ON t1.address_id = t2.address_id AND t1.purchased_at = t2.max_purchased_at
ORDER BY t1.purchased_at DESC

答案 1 :(得分:8)

DISTINCT ON使用您的ORDER BY来选择要生成的每个不同address_id的哪一行。如果您想要对结果记录进行排序,请将DISTINCT设置为子选择并对其结果进行排序:

SELECT * FROM
(
  SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
  FROM "purchases"
  WHERE "purchases"."product_id" = 2
  ORDER BY purchases.address_id ASC, purchases.purchased_at DESC
) distinct_addrs
order by distinct_addrs.purchased_at DESC

答案 2 :(得分:0)

这个查询比正确看起来要复杂得多。

currently accepted, join-based answer无法正确处理两个候选行具有相同给定purchased_at值的情况:它将返回两行。

您可以通过这种方式获得正确的行为:

SELECT * FROM purchases AS given
WHERE product_id = 2
AND NOT EXISTS (
    SELECT NULL FROM purchases AS other
    WHERE given.address_id = other.address_id
    AND (given.purchased_at < other.purchased_at OR given.id < other.id)
)
ORDER BY purchased_at DESC

请注意,如果比较id值以消除purchased_at值匹配的情况,它是如何回退的。这可以确保条件只能在具有相同address_id值的行中的单行中成立。

使用DISTINCT ON的原始查询会自动处理此案例!

另请注意,您必须在address_id条件和given.purchased_at < other.purchased_at子句中对您希望“每ORDER BY purchased_at DESC个最新”两次的事实进行编码,以及你必须确保它们匹配。我不得不花费额外的几分钟来说服自己,这个问题确实是正确的。

根据dbenhur的建议,使用DISTINCT ON和外部子查询正确且可理解地编写此查询要容易得多。