已应用GROUP BY后,多列的值不同

时间:2015-04-17 15:47:02

标签: sql postgresql

基本上,我有以下查询(实际上更复杂,但我认为这种简化是可以的):

SELECT a, b, x
FROM table

output:

 a | b | x
-----------
 1 | 2 | 34
 1 | 3 | 35
 1 | 3 | 36
 1 | 4 | 37
 2 | 3 | 38
 2 | 3 | 39
 2 | 4 | 40
 3 | 4 | 41
 3 | 5 | 42

要计算每个"对a和b"的发生次数,我使用GROUP BY:

SELECT a, b, COUNT(x) AS count
FROM table
GROUP BY a, b
ORDER BY count

output:

 a | b | count
--------------
 1 | 2 | 1
 1 | 4 | 1
 2 | 4 | 1
 3 | 4 | 1
 3 | 5 | 1
 1 | 3 | 2
 2 | 3 | 2

困扰我的是a和b的多重出现。我想保持"计数"如果a或b已经在前一行中,则删除每一行。如果它还会移除一行,如果" a"的值,那将是一件好事。出现在前一行中作为" b"反之亦然。

首选预期输出:

 a | b | count
--------------
 1 | 2 | 1
 1 | 4 | 1    <- should not be in output since we had a=1
 2 | 4 | 1    <- should not be in output since we had b=
 3 | 4 | 1    
 3 | 5 | 1    <- should not be in output since we had a=3
 1 | 3 | 2    <- should not be in output since we had a=1 / a=3
 2 | 3 | 2    <- should not be in output since we had b=2 / a=3

因此,这个:

 a | b | count
--------------
 1 | 2 | 1
 3 | 4 | 1    

替代预期输出,如果上述情况过于复杂:

 a | b | count
--------------
 1 | 2 | 1
 1 | 4 | 1    <- should not be in output since we had a=1
 2 | 4 | 1    
 3 | 4 | 1    <- should not be in output since we had b=4
 3 | 5 | 1    
 1 | 3 | 2    <- should not be in output since we had a=1
 2 | 3 | 2    <- should not be in output since we had a=2

因此,这个:

 a | b | count
--------------
 1 | 2 | 1
 2 | 4 | 1    
 3 | 5 | 1    

2 个答案:

答案 0 :(得分:2)

这是一个混乱的问题,但这里需要考虑的事情:

SELECT a, b, count
FROM (
    SELECT a, b, count,
          rank() over (partition by b order by count, a) as b_rank
    FROM (
        SELECT a, b, count,
          rank() over (partition by a order by count, b) as a_rank
        FROM (
            SELECT a, b, COUNT(*) AS count
            FROM t
            GROUP BY a, b
            ORDER BY count
          ) pc
      ) pc2
    WHERE a_rank < 3
  ) pc3
WHERE b_rank = 1

每个a值在结果中最多显示两次,但b值将是唯一的。出现在低计数对中的某些b值可能不会反映在结果中。可能会重复a与可能完全错过的b值的数量之间存在权衡:允许更多重复a(通过更改为WHERE a_rank < 4 }})减少可能遗漏的b值的数量。

答案 1 :(得分:0)

此查询将为您提供所需的输出。

DECLARE @id INT = 1,
        @a INT,
        @b INT,
        @count INT

DECLARE @tbl TABLE
(
    id INT IDENTITY(1,1),
    a INT,
    b INT,
    count INT
)

INSERT INTO @tbl
SELECT a, b, COUNT(1) AS COUNT FROM dbo.myTable
GROUP BY a, b
ORDER BY COUNT,a,b

SELECT @count = COUNT(1) FROM @tbl

WHILE @id <= @count
BEGIN
    SELECT TOP 1 @a = a,@b = b FROM @tbl WHERE id = @id

    IF EXISTS(SELECT 1 FROM @tbl WHERE id < @id AND (a = @a OR b = @b))
        DELETE @tbl WHERE id = @id

    SET @id += 1
END

SELECT a,b,count FROM @tbl

Check it on SQLFiddle