我的问题是关于SQL Server为DISTINCT
查询构建的执行计划。
我的数据库中有一个名为MainTable
的表,其中有两列:Id (PK, int, not null)
和Name (nvarchar(200), not null)
。
我运行以下查询(由实体框架生成)
SELECT
[Project2].[Id] AS [Id]
FROM
(SELECT
[Distinct1].[Id] AS [Id]
FROM
(SELECT DISTINCT
[Extent3].[Id] AS [Id]
FROM
[dbo].[TableA] AS [Extent1]
INNER JOIN
[dbo].[TableB] AS [Extent2] ON [Extent1].[Id] = [Extent2].[TableAId]
INNER JOIN
[dbo].[MainTable] AS [Extent3] ON [Extent2].[MainTableId] = [Extent3].[Id]
WHERE
[Extent1].[TableCId] = 48 /* @p__linq__0 */) AS [Distinct1]) AS [Project2]
这会产生效率非常低的execution plan。
它使用嵌套循环并添加TOP
语句,尽管它在原始查询中不存在。在表格中的2-3k记录上运行需要几秒钟,并进行大量读取。
我尝试在DISTINCT
列上使用Name
运行类似的查询,结果似乎要好得多。所以这个查询:
SELECT [Project2].[Name] AS [Name]
FROM (SELECT [Distinct1].[Name] AS [Name]
FROM (SELECT DISTINCT [Extent3].[Name] AS [Name]
FROM [dbo].[TableA] AS [Extent1]
INNER JOIN [dbo].[TableB] AS [Extent2]
ON [Extent1].[Id] = [Extent2].[TableAId]
INNER JOIN [dbo].[MainTable] AS [Extent3]
ON [Extent2].[MainTableId] = [Extent3].[Id]
WHERE [Extent1].[TableCId] = 48 /* @p__linq__0 */) AS [Distinct1]) AS [Project2]
在我的案例中效率更高
使用哈希匹配
所以我想知道在DISTINCT
和Id
上制作Name
之间的区别是什么,为什么要添加TOP
语句以及如何使第一个查询更有效率?
由于
更新
使用SELECT DISTINCT [Extent2].[MainTableId]
代替SELECT DISTINCT [Extent3].[Id]
会产生第二个更有效的执行计划。