对于作者概述,我们正在寻找一个查询,其中将显示所有作者,包括他们的最佳书籍。这个查询的问题在于它缺乏速度。只有大约1500位作者和查询确实生成概述目前需要20秒。
主要问题似乎是产生每人所有书籍的平均评分。 通过选择以下查询,它仍然相当快
select
person.id as pers_id,
person.firstname,
person.suffix,
person.lastname,
thriller.title,
year(thriller.orig_pubdate) as year,
thriller.id as thrill_id,
count(user_rating.id) as nr,
AVG(user_rating.rating) as avgrating
from
thriller
inner join
thriller_form
on thriller_form.thriller_id = thriller.id
inner join
thriller_person
on thriller_person.thriller_id = thriller.id
and thriller_person.person_type_id = 1
inner join
person
on person.id = thriller_person.person_id
left outer join
user_rating
on user_rating.thriller_id = thriller.id
and user_rating.rating_type_id = 1
where thriller.id in
(select top 1 B.id from thriller as B
inner join thriller_person as C on B.id=C.thriller_id
and person.id=C.person_id)
group by
person.firstname,
person.suffix,
person.lastname,
thriller.title,
year(thriller.orig_pubdate),
thriller.id,
person.id
order by
person.lastname
但是,如果我们通过选择具有平均评级的书来使子查询更复杂,则生成结果集需要整整20秒。 然后查询如下:
select
person.id as pers_id,
person.firstname,
person.suffix,
person.lastname,
thriller.title,
year(thriller.orig_pubdate) as year,
thriller.id as thrill_id,
count(user_rating.id) as nr,
AVG(user_rating.rating) as avgrating
from
thriller
inner join
thriller_form
on thriller_form.thriller_id = thriller.id
inner join
thriller_person
on thriller_person.thriller_id = thriller.id
and thriller_person.person_type_id = 1
inner join
person
on person.id = thriller_person.person_id
left outer join
user_rating
on user_rating.thriller_id = thriller.id
and user_rating.rating_type_id = 1
where thriller.id in
(select top 1 B.id from thriller as B
inner join thriller_person as C on B.id=C.thriller_id
and person.id=C.person_id
inner join user_rating as D on B.id=D.thriller_id
group by B.id
order by AVG(D.rating))
group by
person.firstname,
person.suffix,
person.lastname,
thriller.title,
year(thriller.orig_pubdate),
thriller.id,
person.id
order by
person.lastname
任何人都有一个很好的建议来加快这个问题吗?
答案 0 :(得分:2)
计算平均值需要进行表扫描,因为您必须对这些值求和,然后除以(相关)行的数量。这反过来意味着你正在进行大量的重新扫描;那很慢。你能计算一次平均值并储存吗?这将使您的查询使用这些预先计算的值。 (是的,它会对数据进行非规范化处理,但通常需要对性能进行非规范化处理;在性能和最小数据之间进行权衡。)
使用临时表作为平均值的存储可能是合适的。