我最近正在研究数据库同步的概念。 情况如下:
有一个主表“ Items”,其中有1M +行
CREATE TABLE [dbo].[Item](
[Id] [uniqueidentifier] NOT NULL,
[Title] [nvarchar](50) NULL,
[Modified] [datetime2](7) NOT NULL,
CONSTRAINT [PK_Item] PRIMARY KEY CLUSTERED ([Id] ASC) ON [PRIMARY]
我们希望以一种非常灵活的方式将数据同步到客户端-因此,我们正在使用“ Items_sync”表,该表包含每个用户的条目以及他们在同步过程中应下载的每个条目。
CREATE TABLE [dbo].[Item_syncfilter](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[ItemId] [uniqueidentifier] NOT NULL,
[Modified] [datetime2](7) NOT NULL,
[IsDeleted] [bit] NOT NULL,
[UserId] [bigint] NOT NULL,
CONSTRAINT [PK_Item_syncfilter] PRIMARY KEY CLUSTERED ([Id] ASC) ON [PRIMARY]
现在,使这变得有些复杂的是: 有许多原因,为什么特定用户可以获得下载特定行的权限(将他添加到贡献者组,将他添加到管理员组,将项目直接分配给他/她)。 因此,对于一个项目,同一用户可能有多行,说明允许她下载该项目。
此外,同步过程需要逐步进行。 含义: *如果用户Andrew有权访问项目A,并且该项目已被修改,则下次他同步时,他应该会收到最新版本 *如果用户Andrew没有访问项目A的权限,但是后来他被添加到管理员组(=>获得了相应的Item_sync条目),那么他下次下次同步时应该下载该项目。 *如果安德鲁已经同步了项目A并被添加到管理员组,则不应同步任何内容。
现在我们想出的是以下查询:
declare @userid bigint;
declare @date datetime2(7);
set @date = '2018-05-02 13:00:00.0000000';
set @userid = 5;
select i.*, 0 as Toombstoned from item i
where
-- clause 1: get all modified items where there exists at least one non-deleted sync row
(i.modified >= @date
-- and there exists at least one non-deleted syncfilter
and exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0))
-- clause 2: get all items, which were not modified, but their sync rows are newer (toombstoned or not)
or (i.modified < @date
-- and there is at least one younger, non-deleted syncfilter (permission was added to user)
and exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0 and modified > @date)
-- make sure this item was not already synced by an older valid and non-deleted filter
and not exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0 and modified < @date))
union all
select i.*, 1 as Toombstoned from item i
where
-- clause 3: get all toombstoned items
-- - where no non-deleted syncfilter exists
-- - and there is a deleted sync filter younger than "date"
(not exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0)
and exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 1 and modified > @date))
但是,由于“ exists”的5种用法,它的执行效果很差:即,对于主表中的100万行,查询运行5秒钟,并且STATISTICS IO输出显示少量读取,即使仅查询返回一小部分数据。
您能否给我任何提示,我们可以如何显着改善此查询?
更新 感谢您的回复。 以下sql片段显示 *完整的表格架构 *包括我使用的索引 *和一些测试数据,这些数据展示了查询的工作方式
-- ########################
-- ## Sync Item
-- ########################
CREATE TABLE [dbo].[Item](
[Id] [uniqueidentifier] NOT NULL,
[Title] [nvarchar](100) NULL,
[Modified] [datetime2](7) NOT NULL,
CONSTRAINT [PK_Item] PRIMARY KEY CLUSTERED ([Id] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Item] ADD CONSTRAINT [DF_Item_Id] DEFAULT (newid()) FOR [Id]
GO
ALTER TABLE [dbo].[Item] ADD CONSTRAINT [DF_Item_Modified] DEFAULT (getutcdate()) FOR [Modified]
GO
CREATE NONCLUSTERED INDEX [IX_ItemModified] ON [dbo].[Item]
(
[Modified] DESC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
-- ########################
-- ## sync filter
-- ########################
CREATE TABLE [dbo].[Item_syncfilter](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[ItemId] [uniqueidentifier] NOT NULL,
[Modified] [datetime2](7) NOT NULL,
[IsDeleted] [bit] NOT NULL,
[UserId] [bigint] NOT NULL,
CONSTRAINT [PK_Item_syncfilter] PRIMARY KEY CLUSTERED ([Id] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Item_syncfilter] ADD CONSTRAINT [DF_Item_syncfilter_Modified] DEFAULT (getutcdate()) FOR [Modified]
GO
ALTER TABLE [dbo].[Item_syncfilter] ADD CONSTRAINT [DF_Item_syncfilter_IsDeleted] DEFAULT ((0)) FOR [IsDeleted]
GO
ALTER TABLE [dbo].[Item_syncfilter] ADD CONSTRAINT [DF_Item_syncfilter_UserId] DEFAULT (CONVERT([int],((20)+(1))*rand())) FOR [UserId]
GO
CREATE NONCLUSTERED INDEX [IX_SyncItemModified] ON [dbo].[Item_syncfilter]
(
[UserId] ASC,
[ItemId] ASC,
[IsDeleted] ASC,
[Modified] DESC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_SyncItemItemId] ON [dbo].[Item_syncfilter]
(
[ItemId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
-- ########################
-- ## TestData
-- ########################
INSERT [dbo].[Item] ([Id], [Title], [Modified]) VALUES (N'a14ae781-b595-4fa8-942f-3abf8d848bdf', N'1 new deleted, 1 old still valid NOTINSYNC', CAST(N'2018-05-01T14:10:25.8400000' AS DateTime2))
INSERT [dbo].[Item] ([Id], [Title], [Modified]) VALUES (N'45b71309-49d9-4457-a784-52dcc1331ec2', N'Modified and all filters new', CAST(N'2018-05-03T06:33:04.7200000' AS DateTime2))
INSERT [dbo].[Item] ([Id], [Title], [Modified]) VALUES (N'cf01ebde-7f11-4bad-a32c-54caa6fca14b', N'No new filter NOTINSYNC', CAST(N'2018-05-01T14:10:11.0833333' AS DateTime2))
INSERT [dbo].[Item] ([Id], [Title], [Modified]) VALUES (N'80fc71ff-e984-4dae-bdf1-98e02d27c926', N'All deleted', CAST(N'2018-05-02T14:09:48.6200000' AS DateTime2))
INSERT [dbo].[Item] ([Id], [Title], [Modified]) VALUES (N'a5fa6d29-5c2b-4edb-8390-aeec44232368', N'Modified', CAST(N'2018-05-02T14:09:48.6200000' AS DateTime2))
INSERT [dbo].[Item] ([Id], [Title], [Modified]) VALUES (N'5995209d-c571-40b8-9ff6-b650add6ffbf', N'Some filters new NOTINSYNC', CAST(N'2018-05-01T14:10:04.2900000' AS DateTime2))
INSERT [dbo].[Item] ([Id], [Title], [Modified]) VALUES (N'd79a3967-780c-46e3-b1ec-e6038214e711', N'All filters new', CAST(N'2018-05-01T14:10:04.2900000' AS DateTime2))
SET IDENTITY_INSERT [dbo].[Item_syncfilter] ON
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (1, N'a5fa6d29-5c2b-4edb-8390-aeec44232368', CAST(N'2018-05-01T14:13:30.5000000' AS DateTime2), 0, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (2, N'a5fa6d29-5c2b-4edb-8390-aeec44232368', CAST(N'2018-05-01T14:13:37.8133333' AS DateTime2), 1, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (3, N'd79a3967-780c-46e3-b1ec-e6038214e711', CAST(N'2018-05-02T16:15:04.5933333' AS DateTime2), 0, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (4, N'd79a3967-780c-46e3-b1ec-e6038214e711', CAST(N'2018-05-02T16:15:07.1266667' AS DateTime2), 0, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (5, N'a14ae781-b595-4fa8-942f-3abf8d848bdf', CAST(N'2018-05-01T14:13:37.8133333' AS DateTime2), 0, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (6, N'a14ae781-b595-4fa8-942f-3abf8d848bdf', CAST(N'2018-05-02T14:15:31.7666667' AS DateTime2), 1, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (7, N'cf01ebde-7f11-4bad-a32c-54caa6fca14b', CAST(N'2018-05-01T14:13:37.8133333' AS DateTime2), 0, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (8, N'cf01ebde-7f11-4bad-a32c-54caa6fca14b', CAST(N'2018-05-01T14:13:37.8133333' AS DateTime2), 1, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (9, N'80fc71ff-e984-4dae-bdf1-98e02d27c926', CAST(N'2018-05-01T14:13:37.8133333' AS DateTime2), 1, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (10, N'80fc71ff-e984-4dae-bdf1-98e02d27c926', CAST(N'2018-04-30T14:13:37.8133333' AS DateTime2), 1, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (11, N'5995209d-c571-40b8-9ff6-b650add6ffbf', CAST(N'2018-04-30T14:13:37.8133333' AS DateTime2), 0, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (12, N'5995209d-c571-40b8-9ff6-b650add6ffbf', CAST(N'2018-05-02T16:39:20.5066667' AS DateTime2), 0, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (13, N'5995209d-c571-40b8-9ff6-b650add6ffbf', CAST(N'2018-05-02T16:39:21.7066667' AS DateTime2), 1, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (14, N'45b71309-49d9-4457-a784-52dcc1331ec2', CAST(N'2018-05-03T06:33:34.7900000' AS DateTime2), 0, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (15, N'45b71309-49d9-4457-a784-52dcc1331ec2', CAST(N'2018-05-03T06:33:38.0300000' AS DateTime2), 1, 1)
SET IDENTITY_INSERT [dbo].[Item_syncfilter] OFF
要生成我使用的测试数据:
-- ## Create many items items
DECLARE @startnum INT=1;
DECLARE @endnum INT=5000;
WITH gen AS (
SELECT @startnum AS num
UNION ALL
SELECT num+1 FROM gen WHERE num+1<=@endnum
)
insert into [Item] ([Id], [Title], [Modified])
(SELECT newId() as [Id]
,[Title] + ' -#'+ CONVERT(varchar(1000), n.num) as [Title]
,[Modified]
FROM [Item]
cross join gen as n)
option (maxrecursion 10000);
select count(*) as item_count from item;
-- ## generate syncfilter rows for 10 users
set @startNum = 1;
set @endNum = 10;
WITH gen AS (
SELECT @startnum AS num
UNION ALL
SELECT num+1 FROM gen WHERE num+1<=@endnum
)
insert into item_syncfilter ([ItemId],[Modified],[IsDeleted],[UserId])
(select i.[Id], DATEADD(month, -6, i.[Modified]), 0 as IsDeleted, n.num as [Userid] from item i
left outer join item_syncfilter s on s.itemid = i.id
cross join gen as n
where s.id is null)
option (maxrecursion 10000);
select count(*) item_syncfilter_count from Item_syncfilter;
这将创建35,000个项目和35万个同步过滤行
统计IO输出为
(15003 rows affected)
Table 'Item_syncfilter'. Scan count 35013, logical reads 105648, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Item'. Scan count 1, logical reads 610, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(21 rows affected)
(1 row affected)
您可以从here
下载执行计划答案 0 :(得分:0)
更新:
我一直在研究查询的各个子条款的执行计划。
对于第3条(查找墓碑项),执行计划表明以下索引将改善性能:
CREATE NONCLUSTERED INDEX [IX_SyncItemDeletedItems] ON [dbo].[Item_syncfilter]
(
[IsDeleted],
[UserId],
[Modified])
INCLUDE ([ItemId])
仅在查询中将“统计IO开启”的子句3运行:
set statistics io on
declare @userid bigint;
declare @date datetime2(7);
set @date = '2018-05-02 13:00:00.0000000';
set @userid = 5;
select i.*, 1 as Toombstoned from item i
where
-- clause 3: get all toombstoned items
-- - where no non-deleted syncfilter exists
-- - and there is a deleted sync filter younger than "date"
(not exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0)
and exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 1 and modified > @date))
之前显示:
(0 rows affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Item_syncfilter'. Scan count 1, logical reads 1433, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row affected)
之后:
(0 rows affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Item_syncfilter'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row affected)
...因此大大减少了逻辑读取次数!
对于第1和第2条,它更有趣。如果单独运行,它们的性能会很好,但是结合起来,它们将导致糟糕的执行计划:
set statistics io on
declare @userid bigint;
declare @date datetime2(7);
set @date = '2018-05-02 13:00:00.0000000';
set @userid = 5;
select i.*, 0 as Toombstoned from item i
where
-- clause 1: get all modified items where there exists at least one non-deleted sync row
(i.modified >= @date
-- and there exists at least one non-deleted syncfilter
and exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0))
-- clause 2: get all items, which were not modified, but their sync rows are newer (toombstoned or not)
or (i.modified < @date
-- and there is at least one younger, non-deleted syncfilter (permission was added to user)
and exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0 and modified > @date)
-- make sure this item was not already synced by an older valid and non-deleted filter
and not exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0 and modified < @date))
返回
(0 rows affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Item_syncfilter'. Scan count 229376, logical reads 688128, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Item'. Scan count 1, logical reads 3980, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row affected)
原因是执行计划中的以下执行步骤:
正如您所看到的,sql server扫描聚集索引以查找修改日期为> = @date OR <@date的项,这些项或多或少会返回整个表->因此,这些读取次数很多
所以我要做的就是简单地将两个子句分开,这两个子句使用“ OR”组合成两个单独的查询,而这些查询只是使用UNION ALL组合了:
set statistics io on
declare @userid bigint;
declare @date datetime2(7);
set @date = '2018-05-02 13:00:00.0000000';
set @userid = 5;
select i.*, 0 as Toombstoned from item i
where
-- clause 1: get all modified items where there exists at least one non-deleted sync row
(i.modified >= @date
-- and there exists at least one non-deleted syncfilter
and exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0))
-- clause 2: get all items, which were not modified, but their sync rows are newer (toombstoned or not)
union all
select i.*, 0 as Toombstoned from item i
where i.modified < @date
-- and there is at least one younger, non-deleted syncfilter (permission was added to user)
and exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0 and modified > @date)
-- make sure this item was not already synced by an older valid and non-deleted filter
and not exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0 and modified < @date)
有趣地产生以下统计数据
(0 rows affected)
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Item_syncfilter'. Scan count 3, logical reads 12, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Item'. Scan count 2, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row affected)
=>所以从229376降到3,从688128降到12,依此类推。 这是巨大的收获!