我正在努力处理我正在尝试编写的棘手的SQL查询。看看下表:
+---+---+
| A | B |
+---+---+
| 1 | 2 |
| 1 | 3 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
| 3 | 2 |
| 3 | 3 |
| 4 | 2 |
| 4 | 3 |
| 4 | 4 |
+---+---+
现在,从这个表中,我基本上想要一个包含完全相同B组的所有As的列表,并为每个集合提供递增ID。
因此,上述输出设置为:
+---+----+
| A | ID |
+---+----+
| 1 | 1 |
| 3 | 1 |
| 2 | 2 |
| 4 | 2 |
+---+----+
感谢。
编辑:如果有帮助,我会列出另一个表中可能出现的B的所有不同值。
编辑:非常感谢所有创新的答案。能够学到很多东西。
答案 0 :(得分:5)
这是解决棘手选择的数学技巧:
with pow as(select *, b * power(10, row_number()
over(partition by a order by b)) as rn from t)
select a, dense_rank() over( order by sum(rn)) as rn
from pow
group by a
order by rn, a
小提琴http://sqlfiddle.com/#!3/6b98d/11
这当然只适用于有限的不同计数,因为你会溢出。以下是使用字符串的更通用的解决方案:
select a,
dense_rank() over(order by (select '.' + cast(b as varchar(max))
from t t2 where t1.a = t2.a
order by b
for xml path(''))) rn
from t t1
group by a
order by rn, a
答案 1 :(得分:3)
这样的事情:
select a, dense_rank() over (order by g) as id_b
from (
select a,
(select b from MyTable s where s.a=a.a order by b FOR XML PATH('')) g
from MyTable a
group by a
) a
order by id_b,a
或者可能使用CTE(我尽可能避免使用它们)
作为旁注,这是使用问题中的示例数据的内部查询的输出:
a g
1 <b>2</b><b>3</b>
2 <b>2</b><b>3</b><b>4</b>
3 <b>2</b><b>3</b>
4 <b>2</b><b>3</b><b>4</b>
答案 2 :(得分:2)
修改强> 我正在改变代码,但它现在会变得更大,从中得到帮助 Concatenate many rows into a single text string?用于合并字符串
Select [A],
Left(M.[C],Len(M.[C])-1) As [D] into #tempSomeTable
From
(
Select distinct T2.[A],
(
Select Cast(T1.[B] as VARCHAR) + ',' AS [text()]
From sometable T1
Where T1.[A] = T2.[A]
ORDER BY T1.[A]
For XML PATH ('')
) [C]
From sometable T2
)M
SELECT t.A, DENSE_RANK() OVER(ORDER BY t.[D]) [ID] FROM
#tempSomeTable t
inner join
(SELECT [D] FROM(
SELECT [D], COUNT([A]) [D_A] from
#tempSomeTable t
GROUP BY [D] )P where [C_A]>1)t1 on t1.[D]=t.[D]
答案 3 :(得分:2)
这是一个冗长的方法,通过查找具有相同元素的集合(使用EXCEPT
双向消除,并且刚刚完成半对角笛卡尔积),然后配对相等的设置,用{{标记每对1}},在将ROW_NUMBER()
对解开到最终输出之前,将等效集投影为具有相同A's
的行。
id
目前,这个解决方案只适用于成对的集合,而不是三元组等。一般的WITH joinedSets AS
(
SELECT t1.A as t1A, t2.A AS t2A
FROM MyTable t1
INNER JOIN MyTable t2
ON t1.B = t2.B
AND t1.A < t2.A
),
equalSets AS
(
SELECT js.t1A, js.t2A, ROW_NUMBER() OVER (ORDER BY js.t1A) AS Id
FROM joinedSets js
GROUP BY js.t1A, js.t2A
HAVING NOT EXISTS ((SELECT mt.B FROM MyTable mt WHERE mt.A = js.t1A)
EXCEPT (SELECT mt.B FROM MyTable mt WHERE mt.A = js.t2A))
AND NOT EXISTS ((SELECT mt.B FROM MyTable mt WHERE mt.A = js.t2A)
EXCEPT (SELECT mt.B FROM MyTable mt WHERE mt.A = js.t1A))
)
SELECT A, Id
FROM equalSets
UNPIVOT
(
A
FOR ACol in (t1A, t2A)
) unp;
类型解决方案可能是可行的(但现在超出了我的大脑)。
答案 4 :(得分:2)
这是一个非常简单,快速但近似的解决方案。
CHECKSUM_AGG
可能会为不同的B集返回相同的校验和。
DECLARE @T TABLE (A int, B int);
INSERT INTO @T VALUES
(1, 2),(1, 3),(2, 2),(2, 3),(2, 4),(3, 2),(3, 3),(4, 2),(4, 3),(4, 4);
SELECT
A
,CHECKSUM_AGG(B) AS CheckSumB
,ROW_NUMBER() OVER (PARTITION BY CHECKSUM_AGG(B) ORDER BY A) AS GroupNumber
FROM @T
GROUP BY A
ORDER BY A, GroupNumber;
结果集
A CheckSumB GroupNumber
-----------------------------
1 1 1
2 5 1
3 1 2
4 5 2
对于A
的精确解决方案组,并使用FOR XML,CLR或T-SQL函数将所有B
值连接成一个长(二进制)字符串。然后,您可以通过该连接字符串对ROW_NUMBER
进行分区,以便为组分配编号。如其他答案所示。
答案 5 :(得分:0)
这是一个精确而非近似的解决方案。它使用的不比INNER JOIN和GROUP BY更高级(当然还有DENSE_RANK()来获取你想要的ID。)
它也是通用的,因为它允许在A组内重复B值。
SELECT A,
DENSE_RANK() OVER (ORDER BY MIN_EQUIVALENT_A) AS ID
FROM (
SELECT MATCHES.A1 AS A,
MIN(MATCHES.A2) AS MIN_EQUIVALENT_A
FROM (
SELECT T1.A AS A1,
T2.A AS A2,
COUNT(*) AS NUM_B_VALS_MATCHED
FROM (
SELECT A,
B,
COUNT(*) AS B_VAL_FREQ
FROM MyTable
GROUP BY A,
B
) AS T1
INNER JOIN
(
SELECT A,
B,
COUNT(*) AS B_VAL_FREQ
FROM MyTable
GROUP BY A,
B
) AS T2
ON T1.B = T2.B
AND T1.B_VAL_FREQ = T2.B_VAL_FREQ
GROUP BY T1.A,
T2.A
) AS MATCHES
INNER JOIN
(
SELECT A,
COUNT(DISTINCT B) AS NUM_B_VALS_TOTAL
FROM MyTable
GROUP BY A
) AS CHECK_TOTALS_A1
ON MATCHES.A1 = CHECK_TOTALS_A1.A
AND MATCHES.NUM_B_VALS_MATCHED
= CHECK_TOTALS_A1.NUM_B_VALS_TOTAL
INNER JOIN
(
SELECT A,
COUNT(DISTINCT B) AS NUM_B_VALS_TOTAL
FROM MyTable
GROUP BY A
) AS CHECK_TOTALS_A2
ON MATCHES.A2 = CHECK_TOTALS_A2.A
AND MATCHES.NUM_B_VALS_MATCHED
= CHECK_TOTALS_A2.NUM_B_VALS_TOTAL
GROUP BY MATCHES.A1
) AS EQUIVALENCE_TABLE
ORDER BY 2,1
;