有很多答案可以找到列中最常见的值,但我想要做的是在第1列中识别第2列中的最常见值,而不只是找到公共值本身:
EmployeeID | SicknessReason
---------------------------
1 | Cough
1 | Cough
1 | Cold
2 | Flu
2 | Flu
2 | Cough
3 | Cough
3 | Cough
3 | Cough
我想查找所有最常见的SicknessReason是' Cough'的EmployeeID,所以在这个例子中,我想返回EmployeeIDs 1和3。
编辑:在现实世界中,有更多的列需要相同的方法,即最常见的SicknessReason =' Cough'最常见的ReportingMethod =' SMS'等等。
答案 0 :(得分:3)
这只是计算最常见原因(统计上为“模式”)的一个小变化:
select employeeId
from (select employeeId, sicknessreason, count(*) as cnt
dense_rank() over (partition by employeeId order by count(*) desc) as seqnum
from t
group by employeeId, sicknessreason
) es
where seqnum = 1 and sicknessreason = 'Cough';
请注意,在外部查询中会对原因进行过滤,因此不会影响dense_rank()
。
答案 1 :(得分:0)
您的示例缺少使每行唯一的内容。我拿了你的例子并将它加载到一个带有自动编号列(未显示)的表中,以使每个条目都是唯一的。
SELECT EmployeeID, Reason, Occurence = Count(*)
FROM Test
GROUP BY Reason, EmployeeID
ORDER BY Count(*) DESC
结果:
EmployeeID Reason Occurrence
3 Cough 3
1 Cough 2
2 Flu 2
1 Cold 1
2 Cough 1