mysql - 将Count()小于总数百分比的行组合在一起

时间:2017-08-16 08:33:50

标签: mysql

我正在建立一个数据库,用于跟踪课后教育公司的学生记录,包括上课和学生信息。

我要做的是编写一个查询,该查询可以返回我们从每所学校注册的学生人数,还可以将学校组合在一起,这些学校的总数低于一定比例(我希望在一张图表,但我们有很多学校,只有一名学生来自该学校,我不希望图表有50个酒吧或馅饼片等。)

所以而不是

+-------------+------------+
| School Name | # Students |
+-------------+------------+
| School A    |         52 |
| School B    |         27 |
| School C    |         15 |
| School D    |          2 |
| School E    |          1 |
| School F    |          1 |
+-------------+------------+

我想要

+---------------+------------+
|  School Name  | # Students |
+---------------+------------+
| School A      |         52 |
| School B      |         27 |
| School C      |         15 |
| Other Schools |          4 |
+---------------+------------+

以下是我现在所使用的查询的简化形式,它有效,但在使用多个Selects查询相同信息时有点多余。无论如何都要减少冗余吗?

SELECT @enrollmentSum := COUNT(StudentEnrollmentID) FROM StudentEnrollment;
SELECT SchoolName, COUNT(StudentEnrollmentID) ECount FROM Student
JOIN StudentEnrollment ON StudentEnrollment.StudentID = Student.StudentID
JOIN School ON Student.SchoolID = School.SchoolID
GROUP BY SchoolName
HAVING Ecount >= .025 * @enrollmentSum
UNION ALL
SELECT "Other Schools" as SchoolName, SUM(Ecount) as ECount FROM
(
    SELECT SchoolName, COUNT(StudentEnrollmentID) ECount FROM Student
    JOIN StudentEnrollment ON StudentEnrollment.StudentID = Student.StudentID
    JOIN School ON Student.SchoolID = School.SchoolID
    GROUP BY SchoolName
    HAVING Ecount < .025 * @enrollmentSum
) t2
ORDER BY Ecount DESC

如果需要,相关表格的基本结构:

学生

+-----------+-------------+----------+
| StudentID | StudentName | SchoolID |
+-----------+-------------+----------+

学校

+----------+------------+
| SchoolID | SchoolName |
+----------+------------+

StudentEnrollment

+---------------------+-----------+---------+
| StudentEnrollmentID | StudentID | ClassID |
+---------------------+-----------+---------+

感谢您的帮助

1 个答案:

答案 0 :(得分:0)

提示:

  • count(x)返回&#34; x IS NOT NULL&#34;因此,count(主键)= count(*)更易于阅读

  • &#34; JOIN School ON Student.SchoolID = School.SchoolID&#34;可以改写为&#34;加入学校使用(SchoolID)&#34;它更具可读性,并且只为您提供了一列&#34; SchoolID&#34;在结果集中,如果您使用&#34;选择*&#34;

现在,查询......

SELECT SchoolName, sum(cnt) ECount FROM 
(SELECT IF(count(*)>=.025*@enrollmentSum, SchoolName, 'Others') AS SchoolName,
 COUNT(*) cnt FROM Student
 JOIN StudentEnrollment USING (StudentID)
 JOIN School USING (SchoolID)
 GROUP BY SchoolName) subq
GROUP BY SchoolName
ORDER BY Ecount DESC

使用IF()会将学校名称替换为&#39;其他人&#39;适用于所有低于门槛的学校。请注意,这是在GROUP BY之后计算的,因此您可以在所选表达式中实际使用count(*)。然后另一个GROUP BY将“其他人”分组。在一起。

修改

这是一个非常黑客,但它似乎做你想要的......

SET @total=0; 
SELECT IF(cnt/@total>=0.2, SchoolName, 'Others') SN, sum(cnt) FROM (
    SELECT SchoolName, cnt, @total:=@total+cnt FROM (
        SELECT SchoolName, count(*) cnt FROM st GROUP BY SchoolName
    ) AS foo -- ORDER BY cnt DESC
) AS bar
GROUP BY SN ORDER BY sum(cnt) DESC;

这是变态。 MySQL似乎总是实现子查询&#34; foo&#34;首先将结果存储在缓冲区中,然后再处理子查询&#34; bar&#34;。我想&#34;订购cnt DESC&#34;是必要的,但如果它被注释掉它似乎也有效。

运行子查询&#34; foo&#34;具有将@total设置为我们想要的值的副作用!

因此,当运行外部子查询时,总数可用。

这种方法的问题是它可能会在没有警告的情况下停止工作,因为它是一个黑客攻击。

Postgresql version if you're curiosus...