Get most common value for each value of another column in SQL
我在广泛搜索这个网站和其他人之后问这个问题,但是没有找到符合我的意图的结果。
我有一个人员表(recordid,personid,transactionid)和一个事务表(transactionid,rating)。我需要一个SQL语句,可以返回每个人最常见的评级。
我目前有这个SQL语句,它返回指定人员ID的最常见评级。它有效,也许它可以帮助别人。
SELECT transactionTable.rating as MostCommonRating
FROM personTable, transactionTable
WHERE personTable.transactionid = transactionTable.transactionid
AND personTable.personid = 1
GROUP BY transactionTable.rating
ORDER BY COUNT(transactionTable.rating) desc
LIMIT 1
但是我需要一个声明来执行上述声明对personTable中的每个personid执行的操作。
我的尝试在下面;但是,它超时我的MySQL服务器。
SELECT personid AS pid,
(SELECT transactionTable.rating as MostCommonRating
FROM personTable, transactionTable
WHERE personTable.transactionid = transactionTable.transactionid
AND personTable.personid = pid
GROUP BY transactionTable.rating
ORDER BY COUNT(transactionTable.rating) desc
LIMIT 1)
FROM persontable
GROUP BY personid
你能给我的任何帮助都是非常有帮助的。感谢。
PERSONTABLE
:
RecordID, PersonID, TransactionID
1, Adam, 1
2, Adam, 2
3, Adam, 3
4, Ben, 1
5, Ben, 3
6, Ben, 4
7, Caitlin, 4
8, Caitlin, 5
9, Caitlin, 1
TRANSACTIONTABLE
:
TransactionID, Rating
1 Good
2 Bad
3 Good
4 Average
5 Average
我正在搜索的SQL语句的输出是:
输出:
PersonID, MostCommonRating
Adam Good
Ben Good
Caitlin Average
答案 0 :(得分:23)
请学习使用显式JOIN表示法,而不是旧的(1992年之前)隐式连接表示法。
旧式:
SELECT transactionTable.rating as MostCommonRating
FROM personTable, transactionTable
WHERE personTable.transactionid = transactionTable.transactionid
AND personTable.personid = 1
GROUP BY transactionTable.rating
ORDER BY COUNT(transactionTable.rating) desc
LIMIT 1
首选款式:
SELECT transactionTable.rating AS MostCommonRating
FROM personTable
JOIN transactionTable
ON personTable.transactionid = transactionTable.transactionid
WHERE personTable.personid = 1
GROUP BY transactionTable.rating
ORDER BY COUNT(transactionTable.rating) desc
LIMIT 1
每个JOIN都需要ON条件。
此外,数据中的personID
值是字符串,而不是数字,因此您需要编写
WHERE personTable.personid = "Ben"
例如,让查询处理所显示的表。
您正在寻找聚合的聚合:在这种情况下,计数的最大值。因此,任何通用解决方案都将涉及MAX和COUNT。您无法直接将MAX应用于COUNT,但您可以将MAX应用于子列查询中的列,其中列恰好是COUNT。
使用测试驱动的查询设计 - TDQD构建查询。
SELECT p.PersonID, t.Rating, t.TransactionID
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
此结果将成为子查询。
SELECT s.PersonID, MAX(s.RatingCount)
FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
) AS s
GROUP BY s.PersonID
现在我们知道每个人的最大人数是多少。
要获得结果,我们需要从子查询中选择具有最大计数的行。请注意,如果某人有2个好评和2个不良评分(并且2是该人的同一类型的最大评分数),那么将为该人显示两个记录。
SELECT s.PersonID, s.Rating
FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
) AS s
JOIN (SELECT s.PersonID, MAX(s.RatingCount) AS MaxRatingCount
FROM (SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
) AS s
GROUP BY s.PersonID
) AS m
ON s.PersonID = m.PersonID AND s.RatingCount = m.MaxRatingCount
如果您想要实际的评分数,也可以轻松选择。
这是一段相当复杂的SQL。我不想尝试从头开始编写。的确,我可能不会烦恼;我会一步一步地开发它,或多或少如图所示。但是因为我们在更大的表达式中使用它们之前调试了子查询,所以我们可以对答案充满信心。
请注意,标准SQL提供了一个WITH子句,该子句为SELECT语句添加前缀,并命名子查询。 (它也可以用于递归查询,但我们在这里不需要它。)
WITH RatingList AS
(SELECT p.PersonID, t.Rating, COUNT(*) AS RatingCount
FROM PersonTable AS p
JOIN TransactionTable AS t
ON p.TransactionID = t.TransactionID
GROUP BY p.PersonID, t.Rating
)
SELECT s.PersonID, s.Rating
FROM RatingList AS s
JOIN (SELECT s.PersonID, MAX(s.RatingCount) AS MaxRatingCount
FROM RatingList AS s
GROUP BY s.PersonID
) AS m
ON s.PersonID = m.PersonID AND s.RatingCount = m.MaxRatingCount
这写起来比较简单。不幸的是,MySQL还不支持WITH子句。
上面的SQL现已针对在Mac OS X 10.7.4上运行的IBM Informix Dynamic Server 11.70.FC2进行了测试。该测试暴露了初步评论中诊断出的问题。主要答案的SQL无需更改即可正常工作。
答案 1 :(得分:0)
对于使用Microsoft SQL Server的任何人:您可以创建自定义聚合函数以获得最常见的值。 Ahmed Tarek Hasan撰写的这篇博客文章的例子2描述了如何做到这一点:
http://developmentsimplyput.blogspot.nl/2013/03/creating-sql-custom-user-defined.html
答案 2 :(得分:0)
这是对MySQL max
聚合函数在varchars上进行词法排序(以及对整数/浮点数的预期数字排序)这一事实的滥用:
SELECT
PersonID,
substring(max(concat(lpad(c, 20, '0'), Rating)), 21) AS MostFrequentRating
FROM (
SELECT PersonID, Rating, count(*) c
FROM PERSONTABLE INNER JOIN TRANSACTIONTABLE USING(TransactionID)
GROUP BY PersonID, Rating
) AS grouped_ratings
GROUP BY PersonID;
哪个提供所需的内容:
+----------+--------------------+
| PersonID | MostFrequentRating |
+----------+--------------------+
| Adam | Good |
| Ben | Good |
| Caitlin | Average |
+----------+--------------------+
(请注意,如果每个人有多种模式,它将选择字母输入最高的模式,因此-几乎是随机的-好于差,差于平均)
通过检查以下内容,您应该能够了解max
的工作情况:
SELECT PersonID, Rating, count(*) c, concat(lpad(count(*), 20, '0'), Rating) as LexicalMaxMe
FROM PERSONTABLE INNER JOIN TRANSACTIONTABLE USING(TransactionID)
GROUP BY PersonID, Rating
ORDER BY PersonID, c DESC;
哪个输出:
+----------+---------+---+-----------------------------+
| PersonID | Rating | c | LexicalMaxMe |
+----------+---------+---+-----------------------------+
| Adam | Good | 2 | 00000000000000000002Good |
| Adam | Bad | 1 | 00000000000000000001Bad |
| Ben | Good | 2 | 00000000000000000002Good |
| Ben | Average | 1 | 00000000000000000001Average |
| Caitlin | Average | 2 | 00000000000000000002Average |
| Caitlin | Good | 1 | 00000000000000000001Good |
+----------+---------+---+-----------------------------+