SQL查找唯一组合和出现次数

时间:2013-11-27 10:50:47

标签: sql sql-server

我有一个ms sql数据库表,我想找到订阅者(SubId)已注册的所有类别(CatId)的唯一组合。

SubId CatId Cat 
4     39    Google Play
4     40    Kobo
4     43    Other
5     39    Google Play
5     43    Other
7     49    Amazon
7     39    Google Play
7     40    Kobo
6     39    Google Play
6     40    Kobo
6     43    Other
8     49    Amazon
8     39    Google Play
8     40    Kobo
9     38    Barnes & Noble
9     41    Smashwords

输出有望像:(其中groupId是组合的计数器)

GroupId CatId   Cat         Occurances
1        39     Google Play     2
1        40     Kobo            2
1        43     Other           2
2        39     Google Play     1
2        43     Other           1
3        49     Amazon          2
3        39     Google Play     2
3        40     Kobo            2
4        38     Barnes & Noble  1
4        41     Smashwords      1

任何帮助将不胜感激

2 个答案:

答案 0 :(得分:2)

关键是首先为每个子ID获取一行,并按concatenating the rows into a single column using the SQL Server XML extensions获取所有组合:

SELECT  T.SubID,
        Combinations = STUFF((  SELECT  ',' + t2.Cat
                                FROM    T t2
                                WHERE   t.SubID = t2.SubID
                                ORDER BY t2.Cat
                                FOR XML PATH(''), TYPE
                            ).value('.', 'NVARCHAR(MAX)'), 1, 1, '')
FROM    T
GROUP BY T.SubID;

这给出了:

SUBID   COMBINATIONS
------+-------------------------
4     | Google Play,Kobo,Other
5     | Google Play,Other
6     | Google Play,Kobo,Other
7     | Amazon,Google Play,Kobo
8     | Amazon,Google Play,Kobo
9     | Barnes & Noble,Smashwords

您只需对此结果集执行简单计数:

WITH Combinations AS
(   SELECT  T.SubID,
            Combinations = STUFF((  SELECT  ',' + t2.Cat
                                    FROM    T t2
                                    WHERE   t.SubID = t2.SubID
                                    ORDER BY t2.Cat
                                    FOR XML PATH(''), TYPE
                                ).value('.', 'NVARCHAR(MAX)'), 1, 1, '')
    FROM    T
    GROUP BY T.SubID
)
SELECT  Combinations, Occurances = COUNT(*)
FROM    Combinations
GROUP BY Combinations;

Which would give:

COMBINATIONS              | OCCURANCES
--------------------------+------------
Amazon,Google Play,Kobo   |     2
Barnes & Noble,Smashwords |     1
Google Play,Kobo,Other    |     2
Google Play,Other         |     1

或者要获得您显示的输出,您需要将其加入主表,并使用上面的Combinations列进行分组:

WITH Combinations AS
(   SELECT  T.SubID,
            Combinations = STUFF((  SELECT  ',' + t2.Cat
                                    FROM    T t2
                                    WHERE   t.SubID = t2.SubID
                                    ORDER BY t2.Cat
                                    FOR XML PATH(''), TYPE
                                ).value('.', 'NVARCHAR(MAX)'), 1, 1, '')
    FROM    T
    GROUP BY T.SubID
)
SELECT  GroupID = DENSE_RANK() OVER(ORDER BY c.Combinations),
        T.CatID,
        T.Cat,
        Occurances = COUNT(DISTINCT T.SubID)
FROM    T
        INNER JOIN Combinations c
            ON c.SubID = T.SubID
GROUP BY T.CatID, T.Cat, c.Combinations;

<强> Example on SQL Fiddle

答案 1 :(得分:0)

怎么样

select 
dense_rank() over (order by subid) as GroupID
,catID
,catName
,COUNT(*) as Occurances
 from table
group by subid, catID, catName