高效的联接技术可用于更大的主表,甚至更大的权限表

时间:2018-07-30 08:24:36

标签: sql sql-server sql-server-2008 data-synchronization

我最近正在研究数据库同步的概念。 情况如下:

  • 有一个主表“ Items”,其中有1M +行

        CREATE TABLE [dbo].[Item](
            [Id] [uniqueidentifier] NOT NULL,
            [Title] [nvarchar](50) NULL,
            [Modified] [datetime2](7) NOT NULL,
         CONSTRAINT [PK_Item] PRIMARY KEY CLUSTERED ([Id] ASC) ON [PRIMARY]
    
  • 我们希望以一种非常灵活的方式将数据同步到客户端-因此,我们正在使用“ Items_sync”表,该表包含每个用户的条目以及他们在同步过程中应下载的每个条目。

    CREATE TABLE [dbo].[Item_syncfilter](
        [Id] [bigint] IDENTITY(1,1) NOT NULL,
        [ItemId] [uniqueidentifier] NOT NULL,
        [Modified] [datetime2](7) NOT NULL,
        [IsDeleted] [bit] NOT NULL,
        [UserId] [bigint] NOT NULL,
     CONSTRAINT [PK_Item_syncfilter] PRIMARY KEY CLUSTERED ([Id] ASC) ON [PRIMARY]
    

现在,使这变得有些复杂的是: 有许多原因,为什么特定用户可以获得下载特定行的权限(将他添加到贡献者组,将他添加到管理员组,将项目直接分配给他/她)。 因此,对于一个项目,同一用户可能有多行,说明允许她下载该项目。

此外,同步过程需要逐步进行。 含义:  *如果用户Andrew有权访问项目A,并且该项目已被修改,则下次他同步时,他应该会收到最新版本  *如果用户Andrew没有访问项目A的权限,但是后来他被添加到管理员组(=>获得了相应的Item_sync条目),那么他下次下次同步时应该下载该项目。  *如果安德鲁已经同步了项目A并被添加到管理员组,则不应同步任何内容。

现在我们想出的是以下查询:

declare @userid bigint;
declare @date datetime2(7);
set @date = '2018-05-02 13:00:00.0000000';
set @userid = 5;

select i.*, 0 as Toombstoned from item i
where 
-- clause 1: get all modified items where there exists at least one non-deleted sync row
(i.modified >= @date
    -- and there exists at least one non-deleted syncfilter
    and exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0))
-- clause 2: get all items, which were not modified, but their sync rows are newer (toombstoned or not)
or (i.modified <  @date
    -- and there is at least one younger, non-deleted syncfilter (permission was added to user)
    and exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0 and modified >  @date)
    -- make sure this item was not already synced by an older valid and non-deleted filter
    and not exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0 and modified <  @date))

union all
select i.*, 1 as Toombstoned from item i
where 
-- clause 3: get all toombstoned items
--                  - where no non-deleted syncfilter exists
--                  - and there is a deleted sync filter younger than "date"
(not exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0)
    and exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 1 and modified >  @date))

但是,由于“ exists”的5种用法,它的执行效果很差:即,对于主表中的100万行,查询运行5秒钟,并且STATISTICS IO输出显示少量读取,即使仅查询返回一小部分数据。

您能否给我任何提示,我们可以如何显着改善此查询?

更新 感谢您的回复。 以下sql片段显示 *完整的表格架构 *包括我使用的索引 *和一些测试数据,这些数据展示了查询的工作方式

-- ########################
-- ## Sync Item
-- ########################

CREATE TABLE [dbo].[Item](
    [Id] [uniqueidentifier] NOT NULL,
    [Title] [nvarchar](100) NULL,
    [Modified] [datetime2](7) NOT NULL,
 CONSTRAINT [PK_Item] PRIMARY KEY CLUSTERED ([Id] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Item] ADD  CONSTRAINT [DF_Item_Id]  DEFAULT (newid()) FOR [Id]
GO
ALTER TABLE [dbo].[Item] ADD  CONSTRAINT [DF_Item_Modified]  DEFAULT (getutcdate()) FOR [Modified]
GO

CREATE NONCLUSTERED INDEX [IX_ItemModified] ON [dbo].[Item]
(
    [Modified] DESC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


-- ########################
-- ## sync filter
-- ########################

CREATE TABLE [dbo].[Item_syncfilter](
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [ItemId] [uniqueidentifier] NOT NULL,
    [Modified] [datetime2](7) NOT NULL,
    [IsDeleted] [bit] NOT NULL,
    [UserId] [bigint] NOT NULL,
 CONSTRAINT [PK_Item_syncfilter] PRIMARY KEY CLUSTERED ([Id] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Item_syncfilter] ADD  CONSTRAINT [DF_Item_syncfilter_Modified]  DEFAULT (getutcdate()) FOR [Modified]
GO
ALTER TABLE [dbo].[Item_syncfilter] ADD  CONSTRAINT [DF_Item_syncfilter_IsDeleted]  DEFAULT ((0)) FOR [IsDeleted]
GO
ALTER TABLE [dbo].[Item_syncfilter] ADD  CONSTRAINT [DF_Item_syncfilter_UserId]  DEFAULT (CONVERT([int],((20)+(1))*rand())) FOR [UserId]
GO

CREATE NONCLUSTERED INDEX [IX_SyncItemModified] ON [dbo].[Item_syncfilter]
(
    [UserId] ASC,
    [ItemId] ASC,
    [IsDeleted] ASC,
    [Modified] DESC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_SyncItemItemId] ON [dbo].[Item_syncfilter]
(
    [ItemId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


-- ########################
-- ## TestData
-- ########################

INSERT [dbo].[Item] ([Id], [Title], [Modified]) VALUES (N'a14ae781-b595-4fa8-942f-3abf8d848bdf', N'1 new deleted, 1 old still valid  NOTINSYNC', CAST(N'2018-05-01T14:10:25.8400000' AS DateTime2))
INSERT [dbo].[Item] ([Id], [Title], [Modified]) VALUES (N'45b71309-49d9-4457-a784-52dcc1331ec2', N'Modified and all filters new', CAST(N'2018-05-03T06:33:04.7200000' AS DateTime2))
INSERT [dbo].[Item] ([Id], [Title], [Modified]) VALUES (N'cf01ebde-7f11-4bad-a32c-54caa6fca14b', N'No new filter NOTINSYNC', CAST(N'2018-05-01T14:10:11.0833333' AS DateTime2))
INSERT [dbo].[Item] ([Id], [Title], [Modified]) VALUES (N'80fc71ff-e984-4dae-bdf1-98e02d27c926', N'All deleted', CAST(N'2018-05-02T14:09:48.6200000' AS DateTime2))
INSERT [dbo].[Item] ([Id], [Title], [Modified]) VALUES (N'a5fa6d29-5c2b-4edb-8390-aeec44232368', N'Modified', CAST(N'2018-05-02T14:09:48.6200000' AS DateTime2))
INSERT [dbo].[Item] ([Id], [Title], [Modified]) VALUES (N'5995209d-c571-40b8-9ff6-b650add6ffbf', N'Some filters new NOTINSYNC', CAST(N'2018-05-01T14:10:04.2900000' AS DateTime2))
INSERT [dbo].[Item] ([Id], [Title], [Modified]) VALUES (N'd79a3967-780c-46e3-b1ec-e6038214e711', N'All filters new', CAST(N'2018-05-01T14:10:04.2900000' AS DateTime2))
SET IDENTITY_INSERT [dbo].[Item_syncfilter] ON 

INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (1, N'a5fa6d29-5c2b-4edb-8390-aeec44232368', CAST(N'2018-05-01T14:13:30.5000000' AS DateTime2), 0, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (2, N'a5fa6d29-5c2b-4edb-8390-aeec44232368', CAST(N'2018-05-01T14:13:37.8133333' AS DateTime2), 1, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (3, N'd79a3967-780c-46e3-b1ec-e6038214e711', CAST(N'2018-05-02T16:15:04.5933333' AS DateTime2), 0, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (4, N'd79a3967-780c-46e3-b1ec-e6038214e711', CAST(N'2018-05-02T16:15:07.1266667' AS DateTime2), 0, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (5, N'a14ae781-b595-4fa8-942f-3abf8d848bdf', CAST(N'2018-05-01T14:13:37.8133333' AS DateTime2), 0, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (6, N'a14ae781-b595-4fa8-942f-3abf8d848bdf', CAST(N'2018-05-02T14:15:31.7666667' AS DateTime2), 1, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (7, N'cf01ebde-7f11-4bad-a32c-54caa6fca14b', CAST(N'2018-05-01T14:13:37.8133333' AS DateTime2), 0, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (8, N'cf01ebde-7f11-4bad-a32c-54caa6fca14b', CAST(N'2018-05-01T14:13:37.8133333' AS DateTime2), 1, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (9, N'80fc71ff-e984-4dae-bdf1-98e02d27c926', CAST(N'2018-05-01T14:13:37.8133333' AS DateTime2), 1, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (10, N'80fc71ff-e984-4dae-bdf1-98e02d27c926', CAST(N'2018-04-30T14:13:37.8133333' AS DateTime2), 1, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (11, N'5995209d-c571-40b8-9ff6-b650add6ffbf', CAST(N'2018-04-30T14:13:37.8133333' AS DateTime2), 0, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (12, N'5995209d-c571-40b8-9ff6-b650add6ffbf', CAST(N'2018-05-02T16:39:20.5066667' AS DateTime2), 0, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (13, N'5995209d-c571-40b8-9ff6-b650add6ffbf', CAST(N'2018-05-02T16:39:21.7066667' AS DateTime2), 1, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (14, N'45b71309-49d9-4457-a784-52dcc1331ec2', CAST(N'2018-05-03T06:33:34.7900000' AS DateTime2), 0, 1)
INSERT [dbo].[Item_syncfilter] ([Id], [ItemId], [Modified], [IsDeleted], [UserId]) VALUES (15, N'45b71309-49d9-4457-a784-52dcc1331ec2', CAST(N'2018-05-03T06:33:38.0300000' AS DateTime2), 1, 1)
SET IDENTITY_INSERT [dbo].[Item_syncfilter] OFF

要生成我使用的测试数据:

   -- ## Create many items items
   DECLARE @startnum INT=1;
   DECLARE @endnum INT=5000;

   WITH gen AS (
       SELECT @startnum AS num
       UNION ALL
       SELECT num+1 FROM gen WHERE num+1<=@endnum
   ) 

   insert into [Item] ([Id], [Title], [Modified])
   (SELECT newId() as [Id]
         ,[Title]  + ' -#'+ CONVERT(varchar(1000), n.num) as [Title]
         ,[Modified]
     FROM [Item]
     cross join gen as n)
   option (maxrecursion 10000);

   select count(*) as item_count from item;

   -- ## generate syncfilter rows for 10 users
   set @startNum = 1;
   set @endNum = 10;

   WITH gen AS (
       SELECT @startnum AS num
       UNION ALL
       SELECT num+1 FROM gen WHERE num+1<=@endnum
   )    

    insert into item_syncfilter ([ItemId],[Modified],[IsDeleted],[UserId])
    (select i.[Id], DATEADD(month, -6, i.[Modified]), 0 as IsDeleted, n.num as [Userid] from item i 
       left outer join item_syncfilter s on s.itemid = i.id
         cross join gen as n
       where s.id is null)
    option (maxrecursion 10000);

    select count(*) item_syncfilter_count from Item_syncfilter;

这将创建35,000个项目和35万个同步过滤行

统计IO输出为

(15003 rows affected)
Table 'Item_syncfilter'. Scan count 35013, logical reads 105648, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Item'. Scan count 1, logical reads 610, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(21 rows affected)

(1 row affected)

您可以从here

下载执行计划

1 个答案:

答案 0 :(得分:0)

更新:

我一直在研究查询的各个子条款的执行计划。

对于第3条(查找墓碑项),执行计划表明以下索引将改善性能:

    CREATE NONCLUSTERED INDEX [IX_SyncItemDeletedItems] ON [dbo].[Item_syncfilter] 
    (
        [IsDeleted],
        [UserId],
        [Modified])
    INCLUDE ([ItemId])

仅在查询中将“统计IO开启”的子句3运行:

    set statistics io on

    declare @userid bigint;
    declare @date datetime2(7);
    set @date = '2018-05-02 13:00:00.0000000';
    set @userid = 5;

    select i.*, 1 as Toombstoned from item i
    where 
    -- clause 3: get all toombstoned items
    --                  - where no non-deleted syncfilter exists
    --                  - and there is a deleted sync filter younger than "date"
    (not exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0)
        and exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 1 and modified >  @date))

之前显示:

  (0 rows affected)
  Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
  Table 'Item_syncfilter'. Scan count 1, logical reads 1433, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

  (1 row affected)

之后:

  (0 rows affected)
  Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
  Table 'Item_syncfilter'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

  (1 row affected)

...因此大大减少了逻辑读取次数!

对于第1和第2条,它更有趣。如果单独运行,它们的性能会很好,但是结合起来,它们将导致糟糕的执行计划:

    set statistics io on

    declare @userid bigint;
    declare @date datetime2(7);
    set @date = '2018-05-02 13:00:00.0000000';
    set @userid = 5;

    select i.*, 0 as Toombstoned from item i
    where 
    -- clause 1: get all modified items where there exists at least one non-deleted sync row
    (i.modified >= @date
        -- and there exists at least one non-deleted syncfilter
        and exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0))
    -- clause 2: get all items, which were not modified, but their sync rows are newer (toombstoned or not)
    or (i.modified <  @date
        -- and there is at least one younger, non-deleted syncfilter (permission was added to user)
        and exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0 and modified >  @date)
        -- make sure this item was not already synced by an older valid and non-deleted filter
        and not exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0 and modified <  @date))

返回

    (0 rows affected)
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Item_syncfilter'. Scan count 229376, logical reads 688128, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Item'. Scan count 1, logical reads 3980, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row affected)

原因是执行计划中的以下执行步骤:

enter image description here

正如您所看到的,sql server扫描聚集索引以查找修改日期为> = @date OR <@date的项,这些项或多或少会返回整个表->因此,这些读取次数很多

所以我要做的就是简单地将两个子句分开,这两个子句使用“ OR”组合成两个单独的查询,而这些查询只是使用UNION ALL组合了:

    set statistics io on

    declare @userid bigint;
    declare @date datetime2(7);
    set @date = '2018-05-02 13:00:00.0000000';
    set @userid = 5;

    select i.*, 0 as Toombstoned from item i
    where 
    -- clause 1: get all modified items where there exists at least one non-deleted sync row
    (i.modified >= @date
        -- and there exists at least one non-deleted syncfilter
        and exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0))

    -- clause 2: get all items, which were not modified, but their sync rows are newer (toombstoned or not)
    union all
    select i.*, 0 as Toombstoned from item i
        where i.modified <  @date
        -- and there is at least one younger, non-deleted syncfilter (permission was added to user)
        and exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0 and modified >  @date)
        -- make sure this item was not already synced by an older valid and non-deleted filter
        and not exists (select id from item_syncfilter where itemid = i.id and userid = @userid and isdeleted = 0 and modified <  @date)

有趣地产生以下统计数据

    (0 rows affected)
    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Item_syncfilter'. Scan count 3, logical reads 12, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Item'. Scan count 2, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row affected)

=>所以从229376降到3,从688128降到12,依此类推。 这是巨大的收获!