基本上,我有以下查询(实际上更复杂,但我认为这种简化是可以的):
SELECT a, b, x
FROM table
output:
a | b | x
-----------
1 | 2 | 34
1 | 3 | 35
1 | 3 | 36
1 | 4 | 37
2 | 3 | 38
2 | 3 | 39
2 | 4 | 40
3 | 4 | 41
3 | 5 | 42
要计算每个"对a和b"的发生次数,我使用GROUP BY:
SELECT a, b, COUNT(x) AS count
FROM table
GROUP BY a, b
ORDER BY count
output:
a | b | count
--------------
1 | 2 | 1
1 | 4 | 1
2 | 4 | 1
3 | 4 | 1
3 | 5 | 1
1 | 3 | 2
2 | 3 | 2
困扰我的是a和b的多重出现。我想保持"计数"如果a或b已经在前一行中,则删除每一行。如果它还会移除一行,如果" a"的值,那将是一件好事。出现在前一行中作为" b"反之亦然。
首选预期输出:
a | b | count
--------------
1 | 2 | 1
1 | 4 | 1 <- should not be in output since we had a=1
2 | 4 | 1 <- should not be in output since we had b=
3 | 4 | 1
3 | 5 | 1 <- should not be in output since we had a=3
1 | 3 | 2 <- should not be in output since we had a=1 / a=3
2 | 3 | 2 <- should not be in output since we had b=2 / a=3
因此,这个:
a | b | count
--------------
1 | 2 | 1
3 | 4 | 1
替代预期输出,如果上述情况过于复杂:
a | b | count
--------------
1 | 2 | 1
1 | 4 | 1 <- should not be in output since we had a=1
2 | 4 | 1
3 | 4 | 1 <- should not be in output since we had b=4
3 | 5 | 1
1 | 3 | 2 <- should not be in output since we had a=1
2 | 3 | 2 <- should not be in output since we had a=2
因此,这个:
a | b | count
--------------
1 | 2 | 1
2 | 4 | 1
3 | 5 | 1
答案 0 :(得分:2)
这是一个混乱的问题,但这里需要考虑的事情:
SELECT a, b, count
FROM (
SELECT a, b, count,
rank() over (partition by b order by count, a) as b_rank
FROM (
SELECT a, b, count,
rank() over (partition by a order by count, b) as a_rank
FROM (
SELECT a, b, COUNT(*) AS count
FROM t
GROUP BY a, b
ORDER BY count
) pc
) pc2
WHERE a_rank < 3
) pc3
WHERE b_rank = 1
每个a
值在结果中最多显示两次,但b
值将是唯一的。出现在低计数对中的某些b
值可能不会反映在结果中。可能会重复a
与可能完全错过的b
值的数量之间存在权衡:允许更多重复a
(通过更改为WHERE a_rank < 4
}})减少可能遗漏的b
值的数量。
答案 1 :(得分:0)
此查询将为您提供所需的输出。
DECLARE @id INT = 1,
@a INT,
@b INT,
@count INT
DECLARE @tbl TABLE
(
id INT IDENTITY(1,1),
a INT,
b INT,
count INT
)
INSERT INTO @tbl
SELECT a, b, COUNT(1) AS COUNT FROM dbo.myTable
GROUP BY a, b
ORDER BY COUNT,a,b
SELECT @count = COUNT(1) FROM @tbl
WHILE @id <= @count
BEGIN
SELECT TOP 1 @a = a,@b = b FROM @tbl WHERE id = @id
IF EXISTS(SELECT 1 FROM @tbl WHERE id < @id AND (a = @a OR b = @b))
DELETE @tbl WHERE id = @id
SET @id += 1
END
SELECT a,b,count FROM @tbl