尝试优化查询,该查询对下级表中的对象有多个计数(在SQLAlchemy中使用了别名)。在Witch Academia术语中,类似这样的事情:
SELECT
exam.id AS exam_id,
exam.name AS exam_name,
count(tried_witch.id) AS tried,
count(passed_witch.id) AS passed,
count(failed_witch.id) AS failed
FROM exam
LEFT OUTER JOIN witch AS tried_witch
ON tried_witch.exam_id = exam.id AND
tried_witch.is_failed = 0 AND
tried_witch.status != "passed"
LEFT OUTER JOIN witch AS passed_witch
ON passed_witch.exam_id = exam.id AND
passed_witch.is_failed = 0 AND
passed_witch.status = "passed"
LEFT OUTER JOIN witch AS failed_witch
ON failed_witch.exam_id = exam.id AND
failed_witch.is_failed = 1
GROUP BY exam.id, exam.name
ORDER BY tried ASC
LIMIT 20
女巫数量可能很大(数十万),考试数量较少(数百),因此上述查询相当慢。在很多类似的问题中,我找到了答案,提出了上述建议,但我觉得这里需要一种完全不同的方法。我坚持想出替代方案。注意,需要按计算的数量排序。当然,将零作为计数也是很重要的。 (不要注意一个有趣的模特:女巫可以轻松克隆自己去参加多项考试,因此每个考试的身份)
有一个EXISTS子查询,但没有反映在上面,并且不影响结果,情况是:
# Query_time: 1.135747 Lock_time: 0.000209 Rows_sent: 20 Rows_examined: 98174
# Rows_affected: 0
# Full_scan: Yes Full_join: No Tmp_table: Yes Tmp_table_on_disk: Yes
# Filesort: Yes Filesort_on_disk: No Merge_passes: 0 Priority_queue: No
更新了查询,这仍然很慢:
SELECT
exam.id AS exam_id,
exam.name AS exam_name,
count(CASE WHEN (witch.status != "passed" AND witch.is_failed = 0)
THEN witch.id
ELSE NULL END) AS tried,
count(CASE WHEN (witch.status = "passed" AND witch.is_failed = 0)
THEN witch.id
ELSE NULL END) AS passed,
count(CASE WHEN (witch.is_failed = 1)
THEN witch.id
ELSE NULL END) AS failed
FROM exam
LEFT OUTER JOIN witch ON witch.exam_id = exam.id
GROUP BY exam.id, exam.name
ORDER BY tried ASC
LIMIT 20
答案 0 :(得分:0)
索引是获得查询效果的关键
我根本不知道MariaDB
,所以不确定可能性是什么。但如果它像Microsoft SQL Server
那样,那么我会尝试这样:
创建一个涵盖所有必需列的综合索引:witch_id
,status
和is_failed
。如果查询使用该索引,那应该是它。这里包含的列的顺序可能非常重要。然后对查询进行概要分析,以了解是否使用了索引。请参阅Optimization and Indexes文档页面。
考虑Generated (Virtual and Persistent) Columns
看起来witch
到tried
,passed
或failed
分类的所有分类信息都包含在witch
的行中。因此,您基本上可以直接在数据库表上创建virtual
列,并使用PERSISTENT
选项。此选项允许在其上创建索引。然后,您可以专门为包含witch_id
和三个虚拟列的查询创建索引:tried
,passed
和failed
。确保查询使用它,这应该是相当不错的。然后查询看起来很简单:
SELECT exam.id,
exam.name,
sum(witch.tried) AS tried,
sum(witch.passed) AS passed,
sum(witch.failed) AS failed
FROM exam
INNER JOIN witch ON exam.id = witch.exam_id
GROUP BY exam.id,
exam.name
ORDER BY sum(witch.tried)
LIMIT 20
虽然查询简单比较和AND / OR子句,但您基本上是在INSERT / UPDATE期间将3个状态的计算卸载到数据库。然后在SELECT期间,您的查询应该更快。
您的示例未指定任何结果过滤(WHERE
子句),但如果您有一个,则可能会对优化索引的查询性能的方式产生影响。 < / p>
原始回答:以下是最初建议的查询更改 这里我假设优化的索引部分已经完成。
您可以尝试使用SUM
代替COUNT
吗?
SELECT exam.id,
exam.name,
sum(CASE
WHEN (witch.is_failed = 0
AND witch.status != 'passed') THEN 1
ELSE 0
END) AS tried,
sum(CASE
WHEN (witch.is_failed = 0
AND witch.status = 'passed') THEN 1
ELSE 0
END) AS passed,
sum(CASE
WHEN (witch.is_failed = 1) THEN 1
ELSE 0
END) AS failed
FROM exam
INNER JOIN witch ON exam.id = witch.exam_id
GROUP BY exam.id,
exam.name
ORDER BY sum(CASE
WHEN (witch.is_failed = 0
AND witch.status != 'passed') THEN 1
ELSE 0
END)
LIMIT 20
其余的:
鉴于您在答案中指定了sqlalchemy
,这里是sqlalchemy
代码,我用它来建模并生成查询:
# model
class Exam(Base):
id = Column(Integer, primary_key=True)
name = Column(String)
class Witch(Base):
id = Column(Integer, primary_key=True)
exam_id = Column(Integer, ForeignKey('exam.id'))
is_failed = Column(Integer)
status = Column(String)
exam = relationship(Exam, backref='witches')
# computed fields
@hybrid_property
def tried(self):
return self.is_failed == 0 and self.status != 'passed'
@hybrid_property
def passed(self):
return self.is_failed == 0 and self.status == 'passed'
@hybrid_property
def failed(self):
return self.is_failed == 1
# computed fields: expression
@tried.expression
def _tried_expression(cls):
return case([(and_(
cls.is_failed == 0,
cls.status != 'passed',
), 1)], else_=0)
@passed.expression
def _passed_expression(cls):
return case([(and_(
cls.status == 'passed',
cls.is_failed == 0,
), 1)], else_=0)
@failed.expression
def _failed_expression(cls):
return case([(cls.is_failed == 1, 1)], else_=0)
和
# query
q = (
session.query(
Exam.id, Exam.name,
func.sum(Witch.tried).label("tried"),
func.sum(Witch.passed).label("passed"),
func.sum(Witch.failed).label("failed"),
)
.join(Witch)
.group_by(Exam.id, Exam.name)
.order_by(func.sum(Witch.tried))
.limit(20)
)