Azure SQL数据库 - 索引超过10亿行

时间:2016-01-21 02:22:53

标签: sql-server azure indexing azure-sql-database

我在Azure SQL数据库上托管了数据库,下面是单个表的架构:

CREATE TABLE [dbo].[Article](
    [ArticleHash] [bigint] NOT NULL,
    [FeedHash] [bigint] NOT NULL,
    [PublishedOn] [datetime] NOT NULL,
    [ExpiresOn] [datetime] NOT NULL,
    [DateCreated] [datetime] NOT NULL,
    [Url] [nvarchar](max) NULL,
    [Title] [nvarchar](max) NULL,
    [Summary] [nvarchar](max) NULL
 CONSTRAINT [PK_dbo.Article] PRIMARY KEY CLUSTERED 
(
    [ArticleHash] ASC,
    [FeedHash] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

我有一些我正在执行的查询非常慢,因为这个表包含超过1000万条记录:

SELECT * 
FROM (SELECT ROW_NUMBER() OVER (ORDER BY PublishedOn DESC) page_rn, *
      FROM Article
      WHERE (FeedHash = -8498408432858355421 AND ExpiresOn > '2016-01-18 14:18:04.970')
     ) paged 
WHERE page_rn>0 AND page_rn<=21 

还有一个:

SELECT ArticleHash
FROM Article
WHERE (FeedHash = -8498408432858355421 
       AND ArticleHash IN (-1776401574438488264,996871668263687248,-5186412434178204433,6410875610077852481,-5428137965544411137,-5326808411357670185,2738089298373692963,9180394103094543689,8120572317154347382,-369910952783360989,1071631911959711259,1187953785740614613,6665010324256449533,3720795027036815325,-5458296665864077096,-5832860214011872788,-2941009192514997875,334202794706549486,-5579819992060984166,-696086851747657853,-7466754676679718482,-1461835507954240474,9021713212273098604,-6337379666850984216,5502287921912059432) 
       AND ExpiresOn >= '2016-01-18 14:28:25.883')

索引此表的最佳方法是什么,以便查询执行时间低于300毫秒?在这么大的桌子上甚至可能吗? Azure SQL数据库版本为S3。

此外,对此表执行了大量DELETE / INSERT操作,因此任何索引都不应影响这些索引的性能......

2 个答案:

答案 0 :(得分:0)

第一次查询将受益于OFFSETFETCH的{​​{3}}:

SELECT * 
FROM Article
WHERE FeedHash = -8498408432858355421 AND ExpiresOn > '2016-01-18 14:18:04.970'
ORDER BY PublishedOn DESC
OFFSET 0 FETCH NEXT 20 ROWS ONLY

第二个查询可能会受益于将IN列表替换为表的INNER JOIN

DECLARE @ArticleHashList AS TABLE (ArticleHashWanted bigint PRIMARY KEY);
INSERT INTO @ArticleHashList (ArticleHashWanted) VALUES
    (-1776401574438488264),
    (  996871668263687248),
    (-5186412434178204433),
    ( 6410875610077852481),
    (-5428137965544411137),
    (-5326808411357670185),
    ( 2738089298373692963),
    ( 9180394103094543689),
    ( 8120572317154347382),
    ( -369910952783360989),
    ( 1071631911959711259),
    ( 1187953785740614613),
    ( 6665010324256449533),
    ( 3720795027036815325),
    (-5458296665864077096),
    (-5832860214011872788),
    (-2941009192514997875),
    (  334202794706549486),
    (-5579819992060984166),
    ( -696086851747657853),
    (-7466754676679718482),
    (-1461835507954240474),
    ( 9021713212273098604),
    (-6337379666850984216),
    ( 5502287921912059432);

SELECT ArticleHash
FROM Article
INNER JOIN @ArticleHashList On ArticleHash = ArticleHashWanted
WHERE FeedHash = -8498408432858355421 AND ExpiresOn >= '2016-01-18 14:28:25.883';

在日期创建索引应该有很多帮助:

CREATE INDEX idx_Article_PublishedOn ON Article (PublishedOn);
CREATE INDEX idx_Article_ExpiresOn ON Article (ExpiresOn);

答案 1 :(得分:0)

对于第一个查询我推荐这个索引:

create index ix_Article_FeedHash_ExpiresOn_withInclude on Article(FeedHash,ExpiresOn) include ( DateCreated, PublishedOn, Url, Title, Summary)

和第二个查询应该使用聚簇索引查找,你必须看看Actul执行计划发生了什么。另外我认为你有不好的聚簇索引,因为valuse看起来没有增长但必须是随机的,可能索引是非常分散的,你可以用查询检查

select * from sys.dm_db_index_physical_stats(db_id(), object_id('Article'), null, null, 'DETAILED');

如果avg_fragmentation_in_percent介于5到30之间,那么您可以通过

修复它
alter index [clustered index name] on Article reorganize;

如果avg_fragmentation_in_percent高于30,则可以通过

修复它
alter index [clustered index name] on Article rebuild;

(如果重组后没有任何变化,那么你可以尝试重建)