Question

我需要帮助才能在表格中插入200万行。我插入的表有40亿行，我插入的表有200万行。插入速率约为每分钟190行。

DECLARE @BatchSize INT = 5000

WHILE 1 = 1  
BEGIN 
   INSERT INTO [dbo].[a] ([a].[col1], [a].[col2], [a].[adate], [a].[importdate])
       SELECT TOP(@BatchSize) 
           b.col1,  
           b.col2,  
           b.adate,
           b.importdate
       FROM 
           b 
       WHERE 
           NOT EXISTS (SELECT 1 
                       FROM dbo.[a] 
                       WHERE [a].col1 = b.col1
                         AND [a].col2 = b.col2
                         AND [a].adate = b.adate)
                         --AND [sent].aDate > getdate()-10)

    IF @@ROWCOUNT < @BatchSize BREAK  
END;

在上面的查询中，在表a中，col1和col2以及col3是主键（非群集）。我想在表b中插入表a中的每条记录......

表a有3个索引，一个是col1.col2，第二个是col1，col2，col3，第三个只有col1 ......

任何人都可以提出让它更快的想法吗？

我在SQL Server 2008 R2上有128 Gb RAM。

谢谢

Answer 1

由于您希望在B中插入A中的所有行，因此不需要使用exists。问题变成跟踪先前批次中已传输的行。以下示例生成行号并使用它将行分组到批处理中。如果行号按现有索引排序，则select侧不应要求排序传递。

-- Sample data.
declare @A as Table ( Col1 Int, Col2 Int );

declare @B as Table ( Col1 Int, Col2 Int );
insert into @B ( Col1, Col2 ) values
  ( 1, 1 ), ( 1, 2 ), ( 1, 3 ), ( 1, 4 ), ( 1, 5 ),
  ( 2, 1 ), ( 2, 2 ), ( 2, 3 ), ( 2, 4 ), ( 2, 5 );

-- Rows to transfer in each batch.
declare @BatchSize as Int = 5;
-- First row to transfer in the current batch.
declare @BatchMark as Int = 1;
-- Count of rows processed.
declare @RowsProcessed as Int = 1;

-- Process the batches.
while @RowsProcessed > 0
  begin
  insert into @A ( Col1, Col2 )
    select Col1, Col2
      from ( select Col1, Col2, Row_Number() over ( order by Col1, Col2 ) as RN from @B ) as PH
      where @BatchMark <= RN and RN < @BatchMark + @BatchSize;
  select @RowsProcessed = @@RowCount, @BatchMark += @BatchSize;
  select * from @A; -- Show progress.
  end;

备选方案包括向B表添加标志列以标记已处理的行，使用id表中的现有B来跟踪已处理的最大值，使用额外的用于跟踪已处理行的索引值，从B，...删除已处理行的表。

output条款可能对某些替代方案有用。

在传输数据之前，使用合适的填充因子重建索引可能会有所帮助。见here。这取决于你的问题中没有的指数值的知识。

SQL Server中的插入速度确实很慢

1 个答案: