优化执行缓慢的每组最多n个查询

时间:2018-07-04 11:12:05

标签: sql sql-server database tsql greatest-n-per-group

我有下表:

id | query   | update_date | website_id | device  | page    | impressions | clicks | position | is_brand
---+---------+-------------+------------+---------+---------+-------------+--------+----------+---------
1  | kitchen | 2018-05-01  | 2          | desktop | http... | 11000       | 50     | 3        | 1
2  | table   | 2018-05-01  | 2          | desktop | http... | 7000        | 40     | 3        | 0
3  | kitchen | 2018-05-02  | 2          | desktop | http... | 11500       | 55     | 3        | 1
4  | table   | 2018-05-02  | 2          | desktop | http... | 7100        | 35     | 3        | 0

在此表中,我需要一个过程,该过程针对给定时间段内的每次点击,为每个唯一查询提供最佳效果的行。这导致了以下过程:

create or alter procedure get_best_website_querys
    @from as date,
    @to as date,
    @website_id as int
as
begin
    WITH    cte
          AS (SELECT    *
              ,         ROW_NUMBER() OVER (PARTITION BY query ORDER BY clicks DESC) RN
              FROM      search_console_query
              where 
                update_date >= @from and 
                update_date <= @to and 
                website_id = @website_id 
             )
    SELECT  cte.id
     ,      cte.query
     ,      cte.update_date
     ,      cte.website_id
     ,      cte.device
     ,      cte.page
     ,      cte.impressions
     ,      cte.clicks
     ,      cte.POSITION
     ,      cte.is_brand
    FROM    cte
    WHERE   RN = 1
end;

现在,这可以正常工作并给我正确的结果。我的问题是,该表会变得很大,并且此查询的执行速度非常慢(一年3分钟以上)。该查询给出了以下执行计划:

enter image description here

在桌上,我在clicks上有一个非聚集索引,在(website_id, update_date)上有一个聚集索引。

我想提出一些有关使它更好地执行的最佳方法的意见。任何输入将不胜感激。

4 个答案:

答案 0 :(得分:2)

首先,尝试在search_console_query scq(website_id, update_date, query, clicks)上添加索引。

然后建议您尝试此版本:

select scq.*
from search_console_query scq
where scq.update_date >= @from and 
      scq.update_date <= @to and 
      scq.website_id = @website_id and
      scq.clicks = (select max(scq2.clicks)
                    from search_console_query scq2
                    where scq2.website_id = scq.website_id and
                          scq2.query = scq.query and
                          scq2.update_date >= @from and
                          scq2.update_date <= @to
                    );

此版本可以利用两个索引:search_console_query(website_id, query, update_date, clicks)search_console_query(website_id, update_date, query, clicks)

这略有不同,因为在出​​现平局的情况下它将返回查询的多行。如果性能显着提高-这是一个问题-那么可以解决。

编辑:

第二个版本中删除重复项的最简单方法是假定表具有唯一的id列:

select scq.*
from search_console_query scq
where scq.update_date >= @from and 
      scq.update_date <= @to and 
      scq.website_id = @website_id and
      scq.sqc_id = (select top (1) sqc2.id
                    from search_console_query scq2
                    where scq2.website_id = scq.website_id and
                          scq2.query = scq.query and
                          scq2.update_date >= @from and
                          scq2.update_date <= @to
                    order by scq2.clicks desc);

答案 1 :(得分:2)

我建议使用上面建议的索引。其次,参数嗅探也可能在这里发生。我建议您按如下所示重新声明存储过程中的变量,以免出现参数嗅探:

create or alter procedure get_best_website_querys    
    @from as date,
    @to as date,
    @website_id as int
as
begin
DECLARE @StartDate AS DATE = @from
       ,@EndDate AS DATE = @to
       ,@WebsiteID AS INT = @website_id

      WITH    cte
      AS (SELECT    *
          ,         ROW_NUMBER() OVER (PARTITION BY query ORDER BY clicks DESC) RN
          FROM      search_console_query
          where 
            update_date >= @StartDate and 
            update_date <= @EndDate and 
            website_id = @WebsiteID
         )
SELECT  cte.id
 ,      cte.query
 ,      cte.update_date
 ,      cte.website_id
 ,      cte.device
 ,      cte.page
 ,      cte.impressions
 ,      cte.clicks
 ,      cte.POSITION
 ,      cte.is_brand
FROM    cte
WHERE   RN = 1
end;

答案 2 :(得分:1)

好像select子句中的所有列都是可索引的;您可以尝试使用包含的列创建大量覆盖索引:

CREATE INDEX TEST_0001 ON search_console_query (
    website_id,
    update_date,
    query,
    clicks
) INCLUDE (
    id,
    device,
    page,
    impressions,
    position,
    is_brand
)

在使用时,请尝试以下更多变体,看看SQL Server选择了哪个变体,然后删除不必要的变体:

  • 更改website_id, update_date的顺序
  • 在包含的列中移动query, clicks

答案 3 :(得分:0)

一种替代方法。但是不确定性能,通常此模式用于查找选择列表中未分组列的最新记录。

select * from (
select * from Table t order by timestamp desc)a 
group by user_id, device ;