Extremely slow SELECT statement with WHERE on a FK field

时间:2016-02-03 04:19:02

标签: sql sql-server tsql sql-server-2014-express

I have this query below, and it's extremely slow. It takes almost 2 minutes for run to return 3,008 records out of a table with 99 million records. The first query where it gets "Article" data is super fast, less than 1 second and always returns 1 record. It's the second query that's the problem. I don't really want to JOIN these queries. The first one is so quick, and (in my real query) I'm setting more than just @ArticleID for further use.

The query execution plan says it has 75% for it on a clustered key lookup on IX_Name, which didn't make sense to me because I'm not even doing anything with name fields here. Furthermore, Id and ArticleID are both indexes on ArticleAuthor, so I'm not sure what I'm doing wrong. I can't do much with IX_Name being the clustered index...my boss created this table and said to do that.

DECLARE @DOI VARCHAR(72) = '10.1140/EPJC/S10052-012-1993-2'

DECLARE @ArticleID VARCHAR(12) 

SELECT 
    @ArticleID = A.Id
FROM
    Article A 
LEFT JOIN 
    JournalName JN WITH (NOLOCK) ON JN.Id = A.JournalId 
WHERE 
    A.DOI = @DOI

PRINT 'GOT ARTICLE DATA ' + format(getdate(), 'yyyy-MM-dd HH:mm:ss.fff')

SELECT 
    AA.Id 
FROM 
    [ArticleWarehouseTemp]..ArticleAuthor AA WITH (NOLOCK)
WHERE 
    AA.ArticleID = @ArticleID 

PRINT 'GOT ARTICLEAUTHOR DATA ' + format(getdate(), 'yyyy-MM-dd HH:mm:ss.fff')

Please help! This is driving me insane. I've attached the table structure and indexes here too.

CREATE TABLE [dbo].[ArticleAuthor]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[ArticleId] [int] NOT NULL,
[FullName] [nvarchar](128) NULL,
[LastName] [nvarchar](64) NULL,
[FirstName] [nvarchar](64) NULL,
[FirstInitial] [nvarchar](1) NULL,
[OrcId] [varchar](36) NULL,
[IsSequenceFirst] [bit] NULL,
[SequenceIndex] [smallint] NULL,
[CreatedDate] [smalldatetime] NULL CONSTRAINT [DF_ArticleAuthor_CreatedDate]  DEFAULT (getdate()),
[UpdatedDate] [smalldatetime] NULL,
[Affiliations] [varbinary](max) NULL
) ON [ArticleAuthorFileGroup] TEXTIMAGE_ON [ArticleAuthorFileGroup]
GO

SET ANSI_PADDING OFF
GO

ALTER TABLE [dbo].[ArticleAuthor] WITH CHECK 
   ADD CONSTRAINT [FK_ArticleId] 
   FOREIGN KEY([ArticleId]) REFERENCES [dbo].[Article] ([Id])
GO

ALTER TABLE [dbo].[ArticleAuthor] CHECK CONSTRAINT [FK_ArticleId]
GO

CREATE NONCLUSTERED INDEX [IX_ID] 
ON [dbo].[ArticleAuthor] ([Id] ASC)

CREATE NONCLUSTERED INDEX [IX_ArticleID] 
ON [dbo].[ArticleAuthor] ([ArticleId] ASC)

CREATE CLUSTERED INDEX [IX_Name] 
ON [dbo].[ArticleAuthor] ([LastName] ASC, [FirstName] ASC, [FirstInitial] ASC)

2 个答案:

答案 0 :(得分:1)

You are declaring DECLARE @ArticleID VARCHAR(12) while its int in your table [dbo].[ArticleAuthor][ArticleId] [int] NOT NULL,

Try to make them same datatype to ensure faster response.

答案 1 :(得分:1)

如果必须按原样保留当前聚簇索引,则可以执行以下操作:

1

确保使用的是正确的类型:

DECLARE @ArticleID VARCHAR(12) 

应该是

DECLARE @ArticleID int;

匹配ArticleId表格中的列ArticleAuthor的类型。

2

要确保有效使用索引IX_ArticleID,要使其成为覆盖索引,INCLUDE列为Id

CREATE NONCLUSTERED INDEX [IX_ArticleID] 
ON [dbo].[ArticleAuthor] ([ArticleId] ASC)
INCLUDE(Id);

3

如果您的数据分布非常偏斜,则每个ArticleId的行数会因不同的文章而有很大差异。比如说,如果一篇文章有​​2行而另一篇文章有​​百万行,那么你最好将OPTION(RECOMPILE)添加到查询中,并确保统计数据和/或索引保持最新。 / p>