好的,所以我需要找出在我的数据库中执行一些动态构建的查询的最佳方法。
我有两张表MA_Objects
和MA_Attributes
。 MA_Objects
包含用户列表,并且包含表示与该用户相关的单值属性的列。 (例如,名字,姓氏)。 MA_Attributes
包含该用户的多值属性(例如电子邮件地址)。
MA_Attributes存储由强制属性名称和属性值列组成的值对,这些值取决于其数据类型。所以每列都有一个attributeName,如果该属性是一个字符串,那么它的值将在attributeValueString中。其他attributeValue *列必须为null。
表格结构如下
CREATE TABLE [dbo].[MA_Objects](
[id] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[firstName] [nvarchar](400) NULL,
[givenNames] [nvarchar](400) NULL,
[middleName] [nvarchar](400) NULL,
[surname] [nvarchar](400) NULL,
[objectclass] [nvarchar](400) NULL,
[supervisor] [uniqueidentifier] NULL
CREATE TABLE [dbo].[MA_Attributes](
[id] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[objectId] [uniqueidentifier] NOT NULL,
[attributeName] [nvarchar](30) NOT NULL,
[attributeValueString] [nvarchar](400) NULL,
[attributeValueInt] [bigint] NULL,
[attributeValueBinary] [varbinary](800) NULL,
[attributeValueReference] [uniqueidentifier] NULL
[MA_Attributes].[objectId]
是[MA_Objects].[id]
因此,该应用程序需要能够通过查询这些属性值从MA_Objects表中找到对象ID。搜索可能就像查找邮件地址为“test@test.com”的用户一样简单,也可能是存储在两个表中的复杂属性组合(包含AND和OR标准)。
MA_Objects表中将包含大约500k条记录。 MA_Attributes表中将包含数千万条记录。表现很重要。
我的问题是,我最好使用嵌套的SELECT子查询,还是一系列自连接来实现这一目标?还是完全不同的东西?两者似乎都有效,而且我无法理解实际的查询计划,看看幕后发生了什么。
这两种格式的查询都是相同的
SubQuery Model
select distinct
[o].[id]
from [dbo].[MA_Objects] [o]
left join [dbo].[MA_Attributes] [a]
on ([o].[id] = [a].[objectId])
where
(
[o].[objectClass] = N'user'
and
(
(
[o].[supervisor] in ('6213F48A-A97F-48E2-AFD7-2EF830C4DAA8', '0917EC45-CA23-41F5-911C-B92A90140AFD', '69B1DA67-4E3C-406E-8B78-B4633800B491')
)
and
(
[o].[id] in
(
select [a].[objectId]
from [dbo].[MA_Attributes] [a]
where
(
(
[a].[attributeName] = N'mailAlternateAddresses'
and
[a].[attributeValueString] in (N'test.test@test.com', N'test3.test3@test.com')
)
)
)
)
and
(
[o].[id] in
(
select [a].[objectId]
from [dbo].[MA_Attributes] [a]
where
(
(
[a].[attributeName] = N'objectSids'
and
[a].[attributeValueBinary] in (0x0001020304, 0x0007070707)
)
)
)
)
and
(
[o].[id] in
(
select [a].[objectId]
from [dbo].[MA_Attributes] [a]
where
(
(
[a].[attributeName] = N'expiryDates'
and
[a].[attributeValueInt] in (44, 77, 99)
)
)
)
)
)
)
自我加入模式
select distinct
[o].[id]
from [dbo].[MA_Objects] [o]
left join [dbo].[MA_Attributes] [a]
on ([o].[id] = [a].[objectId])
where
(
[o].[objectClass] = N'user'
and
(
(
[o].[supervisor] in ('6213F48A-A97F-48E2-AFD7-2EF830C4DAA8', '0917EC45-CA23-41F5-911C-B92A90140AFD', '69B1DA67-4E3C-406E-8B78-B4633800B491')
)
and
(
[o].[id] in
(
select [a].[objectId]
from [dbo].[MA_Attributes] [a]
left join [dbo].[MA_Attributes] [a0] on ([a].[objectId] = [a0].[objectId])
left join [dbo].[MA_Attributes] [a1] on ([a].[objectId] = [a1].[objectId])
left join [dbo].[MA_Attributes] [a2] on ([a].[objectId] = [a2].[objectId])
where
(
[a].[objectId] = [a0].[objectId]
and
[a].[id] <> [a0].[id]
and
[a].[objectId] = [a1].[objectId]
and
[a].[id] <> [a1].[id]
and
[a].[objectId] = [a2].[objectId]
and
[a].[id] <> [a2].[id]
and
(
(
[a0].[attributeName] = N'mailAlternateAddresses'
and
[a0].[attributeValueString] in (N'test.test@test.com', N'test3.test3@test.com')
)
and
(
[a1].[attributeName] = N'objectSids'
and
[a1].[attributeValueBinary] in (0x0001020304, 0x0007070707)
)
and
(
[a2].[attributeName] = N'expiryDates'
and
[a2].[attributeValueInt] in (44, 77, 99)
)
)
)
)
)
)
)
答案 0 :(得分:1)
嵌套的select语句几乎可以保证性能最差,每次都要求SQL Server使用。
您的自连接语句仍然是子选择,您应该按如下方式重写。
SELECT o.*
FROM @MA_Objects AS o
LEFT OUTER JOIN @MA_Attributes AS at1
ON o.id = at1.objectId
LEFT OUTER JOIN @MA_Attributes AS at2
ON o.id = at2.objectId
LEFT OUTER JOIN @MA_Attributes AS at3
ON o.id = at3.objectId
WHERE o.objectclass = N'user'
AND o.supervisor IN ( '6213F48A-A97F-48E2-AFD7-2EF830C4DAA8', '0917EC45-CA23-41F5-911C-B92A90140AFD',
'69B1DA67-4E3C-406E-8B78-B4633800B491' )
AND (
at1.attributeName = N'mailAlternateAddresses'
AND at1.attributeValueString IN ( N'test.test@test.com', N'test3.test3@test.com' ) )
AND (
at2.attributeName = N'objectSids'
AND at2.attributeValueBinary IN ( 0x0001020304, 0x0007070707 ) )
AND (
at3.attributeName = N'expiryDates'
AND at3.attributeValueInt IN ( 44, 77, 99 ) )
Join
操作将比IN
操作更快。这样,您只会在每个JOIN
中限制行数后才会返回和评估与o.id
匹配的行。
如果要编写出色的性能查询,应尽快减少行数,并仅使用所需的子集。
因此,根据您需要搜索的信息,您需要相应地重写查询以尽快减少记录数以保持高性能。
注意:忘记提及我将表更改为表变量以使用intellisense