Question

我正在建立一个数据库，用于跟踪课后教育公司的学生记录，包括上课和学生信息。

我要做的是编写一个查询，该查询可以返回我们从每所学校注册的学生人数，还可以将学校组合在一起，这些学校的总数低于一定比例（我希望在一张图表，但我们有很多学校，只有一名学生来自该学校，我不希望图表有50个酒吧或馅饼片等。）

所以而不是

+-------------+------------+
| School Name | # Students |
+-------------+------------+
| School A    |         52 |
| School B    |         27 |
| School C    |         15 |
| School D    |          2 |
| School E    |          1 |
| School F    |          1 |
+-------------+------------+

我想要

+---------------+------------+
|  School Name  | # Students |
+---------------+------------+
| School A      |         52 |
| School B      |         27 |
| School C      |         15 |
| Other Schools |          4 |
+---------------+------------+

以下是我现在所使用的查询的简化形式，它有效，但在使用多个Selects查询相同信息时有点多余。无论如何都要减少冗余吗？

SELECT @enrollmentSum := COUNT(StudentEnrollmentID) FROM StudentEnrollment;
SELECT SchoolName, COUNT(StudentEnrollmentID) ECount FROM Student
JOIN StudentEnrollment ON StudentEnrollment.StudentID = Student.StudentID
JOIN School ON Student.SchoolID = School.SchoolID
GROUP BY SchoolName
HAVING Ecount >= .025 * @enrollmentSum
UNION ALL
SELECT "Other Schools" as SchoolName, SUM(Ecount) as ECount FROM
(
    SELECT SchoolName, COUNT(StudentEnrollmentID) ECount FROM Student
    JOIN StudentEnrollment ON StudentEnrollment.StudentID = Student.StudentID
    JOIN School ON Student.SchoolID = School.SchoolID
    GROUP BY SchoolName
    HAVING Ecount < .025 * @enrollmentSum
) t2
ORDER BY Ecount DESC

如果需要，相关表格的基本结构：

学生

+-----------+-------------+----------+
| StudentID | StudentName | SchoolID |
+-----------+-------------+----------+

学校

+----------+------------+
| SchoolID | SchoolName |
+----------+------------+

StudentEnrollment

+---------------------+-----------+---------+
| StudentEnrollmentID | StudentID | ClassID |
+---------------------+-----------+---------+

感谢您的帮助

Answer 1

提示：

count（x）返回＆＃34; x IS NOT NULL＆＃34;因此，count（主键）= count（*）更易于阅读
＆＃34; JOIN School ON Student.SchoolID = School.SchoolID＆＃34;可以改写为＆＃34;加入学校使用（SchoolID）＆＃34;它更具可读性，并且只为您提供了一列＆＃34; SchoolID＆＃34;在结果集中，如果您使用＆＃34;选择*＆＃34;

现在，查询......

SELECT SchoolName, sum(cnt) ECount FROM 
(SELECT IF(count(*)>=.025*@enrollmentSum, SchoolName, 'Others') AS SchoolName,
 COUNT(*) cnt FROM Student
 JOIN StudentEnrollment USING (StudentID)
 JOIN School USING (SchoolID)
 GROUP BY SchoolName) subq
GROUP BY SchoolName
ORDER BY Ecount DESC

使用IF（）会将学校名称替换为＆＃39;其他人＆＃39;适用于所有低于门槛的学校。请注意，这是在GROUP BY之后计算的，因此您可以在所选表达式中实际使用count（*）。然后另一个GROUP BY将“其他人”分组。在一起。

修改

这是一个非常黑客，但它似乎做你想要的......

SET @total=0; SELECT IF(cnt/@total>=0.2, SchoolName, 'Others') SN, sum(cnt) FROM ( SELECT SchoolName, cnt, @total:=@total+cnt FROM ( SELECT SchoolName, count(*) cnt FROM st GROUP BY SchoolName ) AS foo -- ORDER BY cnt DESC ) AS bar GROUP BY SN ORDER BY sum(cnt) DESC;

这是变态。 MySQL似乎总是实现子查询＆＃34; foo＆＃34;首先将结果存储在缓冲区中，然后再处理子查询＆＃34; bar＆＃34;。我想＆＃34;订购cnt DESC＆＃34;是必要的，但如果它被注释掉它似乎也有效。

运行子查询＆＃34; foo＆＃34;具有将@total设置为我们想要的值的副作用！

因此，当运行外部子查询时，总数可用。

这种方法的问题是它可能会在没有警告的情况下停止工作，因为它是一个黑客攻击。

Postgresql version if you're curiosus...

mysql - 将Count（）小于总数百分比的行组合在一起

1 个答案: