Question

我有下表：

id | query   | update_date | website_id | device  | page    | impressions | clicks | position | is_brand
---+---------+-------------+------------+---------+---------+-------------+--------+----------+---------
1  | kitchen | 2018-05-01  | 2          | desktop | http... | 11000       | 50     | 3        | 1
2  | table   | 2018-05-01  | 2          | desktop | http... | 7000        | 40     | 3        | 0
3  | kitchen | 2018-05-02  | 2          | desktop | http... | 11500       | 55     | 3        | 1
4  | table   | 2018-05-02  | 2          | desktop | http... | 7100        | 35     | 3        | 0

在此表中，我需要一个过程，该过程针对给定时间段内的每次点击，为每个唯一查询提供最佳效果的行。这导致了以下过程：

create or alter procedure get_best_website_querys
    @from as date,
    @to as date,
    @website_id as int
as
begin
    WITH    cte
          AS (SELECT    *
              ,         ROW_NUMBER() OVER (PARTITION BY query ORDER BY clicks DESC) RN
              FROM      search_console_query
              where 
                update_date >= @from and 
                update_date <= @to and 
                website_id = @website_id 
             )
    SELECT  cte.id
     ,      cte.query
     ,      cte.update_date
     ,      cte.website_id
     ,      cte.device
     ,      cte.page
     ,      cte.impressions
     ,      cte.clicks
     ,      cte.POSITION
     ,      cte.is_brand
    FROM    cte
    WHERE   RN = 1
end;

现在，这可以正常工作并给我正确的结果。我的问题是，该表会变得很大，并且此查询的执行速度非常慢（一年3分钟以上）。该查询给出了以下执行计划：

在桌上，我在clicks上有一个非聚集索引，在(website_id, update_date)上有一个聚集索引。

我想提出一些有关使它更好地执行的最佳方法的意见。任何输入将不胜感激。

Answer 1

首先，尝试在search_console_query scq(website_id, update_date, query, clicks)上添加索引。

然后建议您尝试此版本：

select scq.*
from search_console_query scq
where scq.update_date >= @from and 
      scq.update_date <= @to and 
      scq.website_id = @website_id and
      scq.clicks = (select max(scq2.clicks)
                    from search_console_query scq2
                    where scq2.website_id = scq.website_id and
                          scq2.query = scq.query and
                          scq2.update_date >= @from and
                          scq2.update_date <= @to
                    );

此版本可以利用两个索引：search_console_query(website_id, query, update_date, clicks)和search_console_query(website_id, update_date, query, clicks)。

这略有不同，因为在出现平局的情况下它将返回查询的多行。如果性能显着提高-这是一个问题-那么可以解决。

编辑：

第二个版本中删除重复项的最简单方法是假定表具有唯一的id列：

select scq.*
from search_console_query scq
where scq.update_date >= @from and 
      scq.update_date <= @to and 
      scq.website_id = @website_id and
      scq.sqc_id = (select top (1) sqc2.id
                    from search_console_query scq2
                    where scq2.website_id = scq.website_id and
                          scq2.query = scq.query and
                          scq2.update_date >= @from and
                          scq2.update_date <= @to
                    order by scq2.clicks desc);

Answer 2

我建议使用上面建议的索引。其次，参数嗅探也可能在这里发生。我建议您按如下所示重新声明存储过程中的变量，以免出现参数嗅探：

create or alter procedure get_best_website_querys    
    @from as date,
    @to as date,
    @website_id as int
as
begin
DECLARE @StartDate AS DATE = @from
       ,@EndDate AS DATE = @to
       ,@WebsiteID AS INT = @website_id

      WITH    cte
      AS (SELECT    *
          ,         ROW_NUMBER() OVER (PARTITION BY query ORDER BY clicks DESC) RN
          FROM      search_console_query
          where 
            update_date >= @StartDate and 
            update_date <= @EndDate and 
            website_id = @WebsiteID
         )
SELECT  cte.id
 ,      cte.query
 ,      cte.update_date
 ,      cte.website_id
 ,      cte.device
 ,      cte.page
 ,      cte.impressions
 ,      cte.clicks
 ,      cte.POSITION
 ,      cte.is_brand
FROM    cte
WHERE   RN = 1
end;

Answer 3

好像select子句中的所有列都是可索引的；您可以尝试使用包含的列创建大量覆盖索引：

CREATE INDEX TEST_0001 ON search_console_query (
    website_id,
    update_date,
    query,
    clicks
) INCLUDE (
    id,
    device,
    page,
    impressions,
    position,
    is_brand
)

在使用时，请尝试以下更多变体，看看SQL Server选择了哪个变体，然后删除不必要的变体：

更改website_id, update_date的顺序
在包含的列中移动query, clicks

Answer 4

一种替代方法。但是不确定性能，通常此模式用于查找选择列表中未分组列的最新记录。

select * from (
select * from Table t order by timestamp desc)a 
group by user_id, device ;

优化执行缓慢的每组最多n个查询

4 个答案: