为A列的每个值选择N列B的最频繁值

时间:2019-04-05 15:28:38

标签: mysql select mariadb groupwise-maximum

使用如下MySQL表:

id | colA | colB
...| 1    | 13
...| 1    | 13
...| 1    | 12
...| 1    | 12
...| 1    | 11
...| 2    | 78
...| 2    | 78
...| 2    | 78
...| 2    | 13
...| 2    | 13
...| 2    | 9

对于colA中的每个值,我想找到colB中N个最频繁的值。

N = 2的示例结果:

colA | colB
1    | 13
1    | 12
2    | 78
2    | 13

我能够使用以下方式获得colAcolB及其频率的所有唯一组合:

SELECT colA, colB, COUNT(*) AS freq FROM t GROUP BY colA, colB ORDER BY freq DESC;

示例结果:

colA | colB | freq
1    | 13   | 2
1    | 12   | 2
1    | 11   | 1
2    | 78   | 3
2    | 13   | 2
2    | 9    | 1

但是我很难为LIMIT中的每个值而不是整个表应用colA

这基本上类似于How to select most frequent value in a column per each id group?,仅用于MySQL而不是PostgreSQL。

我目前正在使用MariaDB 10.1。

2 个答案:

答案 0 :(得分:1)

使用窗口功能,如果可以的话:

R

请注意,根据您对待领带的方式,可能需要SELECT colA, colB, freq FROM (SELECT colA, colB, COUNT(*) AS freq, DENSE_RANK() OVER (PARTITION BY colA ORDER BY COUNT(*) DESC) as seqnum FROM t GROUP BY colA, colB ) ab WHERE seqnum <= 2; DENSE_RANK()RANK()。如果有5个ROW_NUMBER()值具有最高的两个排名,那么colB将返回全部五个。

如果只需要两个值,请使用DENSE_RANK()

答案 1 :(得分:0)

您可以为此使用几个CTE,例如:

WITH counts AS (
   SELECT colA, colB, COUNT(*) AS freq FROM t GROUP BY colA, colB ORDER BY freq DESC
), most_freq AS (
   SELECT colA, max(freq) FROM counts GROUP BY colA
)
   SELECT counts.*
     FROM counts
     JOIN most_freq ON (counts.colA = most_freq.colA 
                        AND counts.freq = most_freq.freq);