使用两个或多个分组列选择最常出现的记录

时间:2009-07-30 16:31:54

标签: sql

我有一个没有主键的表,我无法添加一个 - 相关的列是:

Department   | Category  | 
-------------+-----------+
0001         | A         |
0002         | D         |
0003         | A         | 
0003         | A         |
0003         | C         |
0004         | B         |

我想为每个Department检索一行,它为我提供了部门代码和表中最常出现的Category,即

Department   | Category  | 
-------------+-----------+
0001         | A         |
0002         | D         |
0003         | A         | 
0004         | B         |

实现这一目标的最佳方法是什么?我当前的尝试涉及子查询中的Count(Category),然后从中获取Max(CountofCategory),但在此阶段包含Category字段表示返回的行太多(从GROUP BY开始适用于Category级别以及Department)。在平局的情况下,我只是任意选择类别的最小值/最大值。理想情况下,这应该是与数据库无关的,但可能在Oracle或MySQL上运行。

4 个答案:

答案 0 :(得分:3)

在Oracle和SQL Server中均可使用,我相信它是所有标准SQL,来自后来的标准:

with T_with_RN as
    (select Department
        , Category
        , row_number() over (partition by Department order by count(*) Desc) as RN
    from T
    group by Department, Category)
select Department, Category
from T_with_RN
where RN = 1

编辑我不知道为什么我使用了WITH,使用内联视图可能更容易阅读解决方案:

select Department, Category
from (select Department
    , Category
    , row_number() over (partition by Department order by count(*) Desc) as RN
    from T
    group by Department, Category) T_with_RN
where RN = 1

结束编辑

测试用例:

create table T (
    Department varchar(10) null,
    Category varchar(10) null
);

-- Original test case
insert into T values ('0001', 'A');
insert into T values ('0002', 'D');
insert into T values ('0003', 'A');
insert into T values ('0003', 'A');
insert into T values ('0003', 'C');
insert into T values ('0004', 'B');
-- Null Test cases:
insert into T values (null, 'A');
insert into T values (null, 'B');
insert into T values (null, 'B');
insert into T values ('0005', null);
insert into T values ('0005', null);
insert into T values ('0005', 'X');
-- Tie Test case
insert into T values ('0006', 'O');
insert into T values ('0006', 'P');

答案 1 :(得分:1)

您也可以尝试以下操作。此处的窗口返回按照每个部门匹配的降序频率排序的类别。 FIRST_VALUE()从中选择第一个。

SELECT DISTINCT (department), 
  FIRST_VALUE(category) OVER
    (PARTITION BY department ORDER BY count(*) DESC ROWS UNBOUNDED PRECEDING)
FROM T
GROUP BY department, category;

答案 2 :(得分:0)

如果你的子查询比我更好,你将不得不清理它,但在我的测试中,这产生了你想要的结果:

SELECT
  main.Department as Department,
  (SELECT 
     Category
   FROM yourtable
   WHERE Department=main.Department
   GROUP BY Category
   ORDER BY COUNT(Category) DESC
   LIMIT 1) AS Category
FROM yourtable main
GROUP BY main.Department

诀窍就是让子查询中的一行用ORDER BY和“LIMIT 1”返回你想要的最大值

答案 3 :(得分:0)

有一种更简单的方法:

select department, stats_mode(category) from T ;
当只需要最常用的值时,

效果很好,当你需要第二,第三......最常见的是你必须像上面那样进行计数。