PostgreSQL中有多个DISTINCT ON子句

时间:2017-02-28 09:58:41

标签: sql postgresql duplicates distinct-on

是否可以选择DISTINCT ON一些独立的独立列集的行?

假设我想要符合以下条件的所有行:

  • (name, birth)
  • 截然不同
  • (name, height)
  • 截然不同

因此,在下表中,标有红叉的行不会是不同的(带有失败子句的指示):

name      birth    height
--------------------------
William    1976      1.82
James      1981      1.68
Mike       1976      1.68
Tom        1967      1.79
William    1976      1.74   ❌ (name, birth)
William    1981      1.82   ❌ (name, height)
Tom        1978      1.92
Mike       1963      1.68   ❌ (name, height)
Tom        1971      1.86
James      1981      1.77   ❌ (name, birth)
Tom        1971      1.89   ❌ (name, birth)

如果是上面的示例,如果DISTINCT ON子句刚刚DISTINCT ON (name, birth, height),则所有行都被认为是不同的。

尝试过但没有工作:

  • SELECT DISTINCT ON (name, birth) (name, height) ...
  • SELECT DISTINCT ON (name, birth), (name, height) ...
  • SELECT DISTINCT ON ((name, birth), (name, height)) ...
  • SELECT DISTINCT ON (name, birth) AND (name, height) ...
  • SELECT DISTINCT ON (name, birth) AND ON (name, height) ...
  • SELECT DISTINCT ON (name, birth) DISTINCT ON (name, height) ...
  • SELECT DISTINCT ON (name, birth), DISTINCT ON (name, height) ...

2 个答案:

答案 0 :(得分:3)

Like I commented,您的问题的解决方案没有明确定义。每次调用的结果行数可以不同。如果您对任意结果感到满意,@klin's solution就足够了。

否则,您需要更密切地定义您的条件。像:

  • (name, birth)截然不同 - 首先选择最小高度,然后选择最小ID作为决胜局
  • (name, height)截然不同 - 首先选择最早出生,然后选择最小身份证作为决胜局

您的表应该有一个主键(或一些方式来唯一标识行):

CREATE TEMP TABLE tbl (tbl_id serial PRIMARY KEY, name text, birth int, height numeric);
    INSERT INTO tbl (name, birth, height)
       VALUES
         ('William', 1976, 1.82)
       , ('James',   1981, 1.68)
       , ('Mike',    1976, 1.68)
       , ('Tom',     1967, 1.79)
       , ('William', 1976, 1.74)
       , ('William', 1981, 1.82)
       , ('Tom',     1978, 1.92)
       , ('Mike',    1963, 1.68)
       , ('Tom',     1971, 1.86)
       , ('James',   1981, 1.77)
       , ('Tom',     1971, 1.89);

查询:

SELECT DISTINCT ON (name, height) *
FROM  (
   SELECT DISTINCT ON (name, birth) *
   FROM   tbl
   ORDER  BY name, birth, height, tbl_id  -- pick smallest height, ID as tiebreaker
   ) s
ORDER  BY name, height, birth, tbl_id;    -- pick earliest birth, ID as tiebreaker
 tbl_id |  name   | birth | height
--------+---------+-------+--------
      2 | James   |  1981 |   1.68
      8 | Mike    |  1963 |   1.68
      4 | Tom     |  1967 |   1.79
      9 | Tom     |  1971 |   1.86
      7 | Tom     |  1978 |   1.92
      5 | William |  1976 |   1.74
      6 | William |  1981 |   1.82
(7 rows)    -- !!!

问题的根源在于,DISTINCT ON没有确定性ORDER BY的查询可以从每组欺骗中返回任意行。应用一次,您仍然可以获得确定数量的行(具有任意选择)。反复应用,行数也是任意的。相关:

答案 1 :(得分:1)

使用派生表:

with my_table(name, birth, height) as (
values
('William',    1976,      1.82),
('James',      1981,      1.68),
('Mike',       1976,      1.68),
('Tom',        1967,      1.79),
('William',    1976,      1.74),  -- ? (name, birth)
('William',    1981,      1.82),  -- ? (name, height)
('Tom',        1978,      1.92),
('Mike',       1963,      1.68),  -- ? (name, height)
('Tom',        1971,      1.86),
('James',      1981,      1.77),  -- ? (name, birth)
('Tom',        1971,      1.89)   -- ? (name, birth)
)
select distinct on (name, height) *
from (
    select distinct on (name, birth) *
    from my_table
    ) s

  name   | birth | height 
---------+-------+--------
 James   |  1981 |   1.68
 Mike    |  1963 |   1.68
 Tom     |  1967 |   1.79
 Tom     |  1971 |   1.89
 Tom     |  1978 |   1.92
 William |  1976 |   1.82
(6 rows)