How do joins affect what indexes are used?

时间:2019-01-18 18:14:11

标签: sql sql-server

Say I have two simple tables Users and Posts defined as the following:

CREATE TABLE [dbo].[Users](
    [UserID] [int] IDENTITY(1,1) NOT NULL,
    [Username] [varchar](255) NULL,
    [FirstName] [varchar](255) NULL,
    [LastName] [varchar](255) NULL,
    [DateOfBirth] [datetime] NULL,
    [Age] [int] NULL,
CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED 
(
    [UserID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE TABLE [dbo].[Posts](
    [PostID] [int] IDENTITY(1,1) NOT NULL,
    [Content] [varchar](max) NULL,
    [NumberOfLikes] [int] NULL,
    [UserID] [int] NULL,
    [CreateDateUTC] [datetime] NULL,
    [Tags] [varchar](max) NULL,
 CONSTRAINT [PK_Posts] PRIMARY KEY CLUSTERED 
(
    [PostID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

Obviously when the database gets very large, performance tuning will become necessary. I figured that the column UserID in Posts is very essential because most of my queries filter by it. So I figured that I'll need to define an INDEX on that column. As for the covering fields, let's assume, for the purpose of this question, that all of my queries look the same (except for the WHERE part):

SELECT
  Posts.Content
 ,Posts.NumberOfLikes
 ,Users.UserName
FROM
  Posts
INNER JOIN
  Users
    ON
    Posts.UserID = Users.UserID
WHERE
  Posts.UserID = @UserID;

My question is about the covering fields. I can easily define an index that covers Content and NumberOfFields like this:

CREATE NONCLUSTERED INDEX [IX_Posts_UserID] ON [dbo].[Posts] (UserID) INCLUDE (Content, NumberOfLikes)

However, my query always joins with the Users table. So is my index still relevant (in terms of performance) despite that my query includes more fields (from another table) than the covering fields of the index? I know that I cannot cover fields from another table so how do I optimize the query in this case? When I looked at the execution plan, I saw that my index IX_Posts_UserID was in fact used (50%, and another 50% by PK_Users) but I was puzzled as to how that happened since I'm selecting columns that are not covered by the index.

So the ultimate question here is: how do table joins factor in the decision of whether indexes are used by SQL Server? Or even simpler, how do joins affect indexes?

EDIT: Per Simonare's comment, below is the execution plan: enter image description here

1 个答案:

答案 0 :(得分:1)

通常,建议至少在所有外键上添加非聚集索引,因为它们很可能经常用在JOIN操作中(偶尔在WHERE谓词中使用)

要在这里专门讨论您的情况,您选择创建的索引包含一个VARCHAR(MAX)字段,这将影响SQL Server决定使用它的方式。由于VARCHAR(MAX)在理论上可以增长到包含2GB的数据,因此引擎不会在页面级别存储字段数据,因为它的大小限制为8KB。在这种情况下,SQL Server决定最便宜的操作是扫描索引(顺便说一句,这并不总是一件坏事,特别是在选择性很高的情况下)。

我在这里的建议是,保持索引紧密,并将其限制在UserId字段中,以促进联接的性能。我不必担心Content列的覆盖索引,因为引擎无论如何都需要比页面级别更深入地挖掘这些数据。

create nonclustered index ix_posts_userid on dbo.Posts (UserID);

请记住,索引不是魔术,而且绝对不是解决所有性能问题的灵丹妙药。正确设计它们可以以成本提高系统效率。想想“办公室里的行政人员”,他们花钱雇人。但是,要在效率方面为企业增加价值。

  

最后,请请勿将标签存储为以逗号分隔的列表,似乎您正在此处。

相反,将标签存储为共享资源并通过“联接表”进行链接。

create table Tags (
    TagId int identity primary key
    ,Content nvarchar(128) not null -- or whatever width suits your needs
);

create table PostTags (
    PostId int not null
    ,TagId int not null
    ,primary key (PostId, TagId)
);