我有下表:
id | query | update_date | website_id | device | page | impressions | clicks | position | is_brand
---+---------+-------------+------------+---------+---------+-------------+--------+----------+---------
1 | kitchen | 2018-05-01 | 2 | desktop | http... | 11000 | 50 | 3 | 1
2 | table | 2018-05-01 | 2 | desktop | http... | 7000 | 40 | 3 | 0
3 | kitchen | 2018-05-02 | 2 | desktop | http... | 11500 | 55 | 3 | 1
4 | table | 2018-05-02 | 2 | desktop | http... | 7100 | 35 | 3 | 0
在此表中,我需要一个过程,该过程针对给定时间段内的每次点击,为每个唯一查询提供最佳效果的行。这导致了以下过程:
create or alter procedure get_best_website_querys
@from as date,
@to as date,
@website_id as int
as
begin
WITH cte
AS (SELECT *
, ROW_NUMBER() OVER (PARTITION BY query ORDER BY clicks DESC) RN
FROM search_console_query
where
update_date >= @from and
update_date <= @to and
website_id = @website_id
)
SELECT cte.id
, cte.query
, cte.update_date
, cte.website_id
, cte.device
, cte.page
, cte.impressions
, cte.clicks
, cte.POSITION
, cte.is_brand
FROM cte
WHERE RN = 1
end;
现在,这可以正常工作并给我正确的结果。我的问题是,该表会变得很大,并且此查询的执行速度非常慢(一年3分钟以上)。该查询给出了以下执行计划:
在桌上,我在clicks
上有一个非聚集索引,在(website_id, update_date)
上有一个聚集索引。
我想提出一些有关使它更好地执行的最佳方法的意见。任何输入将不胜感激。
答案 0 :(得分:2)
首先,尝试在search_console_query scq(website_id, update_date, query, clicks)
上添加索引。
然后建议您尝试此版本:
select scq.*
from search_console_query scq
where scq.update_date >= @from and
scq.update_date <= @to and
scq.website_id = @website_id and
scq.clicks = (select max(scq2.clicks)
from search_console_query scq2
where scq2.website_id = scq.website_id and
scq2.query = scq.query and
scq2.update_date >= @from and
scq2.update_date <= @to
);
此版本可以利用两个索引:search_console_query(website_id, query, update_date, clicks)
和search_console_query(website_id, update_date, query, clicks)
。
这略有不同,因为在出现平局的情况下它将返回查询的多行。如果性能显着提高-这是一个问题-那么可以解决。
编辑:
第二个版本中删除重复项的最简单方法是假定表具有唯一的id
列:
select scq.*
from search_console_query scq
where scq.update_date >= @from and
scq.update_date <= @to and
scq.website_id = @website_id and
scq.sqc_id = (select top (1) sqc2.id
from search_console_query scq2
where scq2.website_id = scq.website_id and
scq2.query = scq.query and
scq2.update_date >= @from and
scq2.update_date <= @to
order by scq2.clicks desc);
答案 1 :(得分:2)
我建议使用上面建议的索引。其次,参数嗅探也可能在这里发生。我建议您按如下所示重新声明存储过程中的变量,以免出现参数嗅探:
create or alter procedure get_best_website_querys
@from as date,
@to as date,
@website_id as int
as
begin
DECLARE @StartDate AS DATE = @from
,@EndDate AS DATE = @to
,@WebsiteID AS INT = @website_id
WITH cte
AS (SELECT *
, ROW_NUMBER() OVER (PARTITION BY query ORDER BY clicks DESC) RN
FROM search_console_query
where
update_date >= @StartDate and
update_date <= @EndDate and
website_id = @WebsiteID
)
SELECT cte.id
, cte.query
, cte.update_date
, cte.website_id
, cte.device
, cte.page
, cte.impressions
, cte.clicks
, cte.POSITION
, cte.is_brand
FROM cte
WHERE RN = 1
end;
答案 2 :(得分:1)
好像select子句中的所有列都是可索引的;您可以尝试使用包含的列创建大量覆盖索引:
CREATE INDEX TEST_0001 ON search_console_query (
website_id,
update_date,
query,
clicks
) INCLUDE (
id,
device,
page,
impressions,
position,
is_brand
)
在使用时,请尝试以下更多变体,看看SQL Server选择了哪个变体,然后删除不必要的变体:
website_id, update_date
的顺序query, clicks
答案 3 :(得分:0)
一种替代方法。但是不确定性能,通常此模式用于查找选择列表中未分组列的最新记录。
select * from (
select * from Table t order by timestamp desc)a
group by user_id, device ;