SQL Server:如果表有索引,UPDATE语句会更快吗?

时间:2014-06-30 11:01:40

标签: sql-server

我需要将1.5百万条记录插入到具有以下布局的表中:

MyEdition
(
    MyEdition_ID INT,
    MyEntity_ID INT,
    copyType_ID INT,
    MyEdition_CopyText VARCHAR(MAX)
)

在从临时表中插入记录之前,我正在禁用表上的所有INDEXES(包括聚簇和非聚簇)和TRIGGERS。

我需要从所有1.5M行的MyEdition_CopyText列中删除所有HTML标记(Bold,Italic和Underline标记除外),因此我在表格中执行UPDATE它使用SQL函数去除MyEdition_CopyText列中的HTML标记。

总结一下我在做什么:

  1. ALTER INDEX [MyEdition_reindex] ON [MyEdition] DISABLE(与所有其他索引类似)
  2. ALTER TABLE [dbo].[MyEdition] DISABLE TRIGGER ALL;
  3. INSERT INTO MyEdition(<columns>) SELECT <columns> FROM <#my_temp_table>
  4. UPDATE MyEdition SET MyEdition_CopyText = dbo.StripHTML(MyEdition_CopyText)
  5. REBULID INDEXES
  6. ENABLE TRIGGERS
  7. UPDATE语句需要一段时间,所以我想知道这是因为所有索引都被禁用了吗?从列中删除HTML字符的最佳方法是什么?

    有关如何改进的建议吗? SQL Server如何在执行UPDATE操作时使用索引?

    我正在使用以下SQL函数:

    ALTER FUNCTION [dbo].[StripOutHTML]
    (
        @HTMLText VARCHAR(max),
        @stripDisallowedOnly BIT
    )
    returns VARCHAR(max) 
    AS 
      BEGIN 
          DECLARE @Start INT
          DECLARE @End INT
          DECLARE @Length INT 
    
          -- Replace the HTML entity & with the '&' character (this needs to be done first, as  
          -- '&' might be double encoded as '&amp;')  
          SET @Start = Charindex('&amp;', @HTMLText) 
          SET @End = @Start + 4 
          SET @Length = ( @End - @Start ) + 1 
    
          WHILE ( @Start > 0 
                  AND @End > 0 
                  AND @Length > 0 ) 
            BEGIN 
                SET @HTMLText = Stuff(@HTMLText, @Start, @Length, '&') 
                SET @Start = Charindex('&amp;', @HTMLText) 
                SET @End = @Start + 4 
                SET @Length = ( @End - @Start ) + 1 
            END 
    
          -- Replace the HTML entity < with the '<' character  
          SET @Start = Charindex('&lt;', @HTMLText) 
          SET @End = @Start + 3 
          SET @Length = ( @End - @Start ) + 1 
    
          WHILE ( @Start > 0 
                  AND @End > 0 
                  AND @Length > 0 ) 
            BEGIN 
                SET @HTMLText = Stuff(@HTMLText, @Start, @Length, '<') 
                SET @Start = Charindex('&lt;', @HTMLText) 
                SET @End = @Start + 3 
                SET @Length = ( @End - @Start ) + 1 
            END 
    
          -- Replace the HTML entity > with the '>' character  
          SET @Start = Charindex('&gt;', @HTMLText) 
          SET @End = @Start + 3 
          SET @Length = ( @End - @Start ) + 1 
    
          WHILE ( @Start > 0 
                  AND @End > 0 
                  AND @Length > 0 ) 
            BEGIN 
                SET @HTMLText = Stuff(@HTMLText, @Start, @Length, '>') 
                SET @Start = Charindex('&gt;', @HTMLText) 
                SET @End = @Start + 3 
                SET @Length = ( @End - @Start ) + 1 
            END 
    
          -- Replace the HTML entity & with the '&' character  
          SET @Start = Charindex('&amp;amp;', @HTMLText) 
          SET @End = @Start + 4 
          SET @Length = ( @End - @Start ) + 1 
    
          WHILE ( @Start > 0 
                  AND @End > 0 
                  AND @Length > 0 ) 
            BEGIN 
                SET @HTMLText = Stuff(@HTMLText, @Start, @Length, '&') 
                SET @Start = Charindex('&amp;amp;', @HTMLText) 
                SET @End = @Start + 4 
                SET @Length = ( @End - @Start ) + 1 
            END 
    
          -- Replace the HTML entity   with the ' ' character  
          SET @Start = Charindex('&nbsp;', @HTMLText) 
          SET @End = @Start + 5 
          SET @Length = ( @End - @Start ) + 1 
    
          WHILE ( @Start > 0 
                  AND @End > 0 
                  AND @Length > 0 ) 
            BEGIN 
                SET @HTMLText = Stuff(@HTMLText, @Start, @Length, ' ') 
                SET @Start = Charindex('&nbsp;', @HTMLText) 
                SET @End = @Start + 5 
                SET @Length = ( @End - @Start ) + 1 
            END 
    
          -- Replace any <P>, </P>tags with a <BR>, so they will be replaced with a new line in next step  
          SET @HTMLText = REPLACE(@HTMLText, '<P>', '<br>') 
          SET @HTMLText = REPLACE(@HTMLText, '</P>', '<br>') 
    
          -- Replace any <BR> tags with a newline  
          SET @Start = Charindex('<br>', @HTMLText) 
          SET @End = @Start + 3 
          SET @Length = ( @End - @Start ) + 1 
    
          WHILE ( @Start > 0 
                  AND @End > 0 
                  AND @Length > 0 ) 
            BEGIN 
                SET @HTMLText = Stuff(@HTMLText, @Start, @Length, 
                                Char(13) + Char(10)) 
                SET @Start = Charindex('<br>', @HTMLText) 
                SET @End = @Start + 3 
                SET @Length = ( @End - @Start ) + 1 
            END 
    
          -- Replace any  tags with a newline  
          SET @Start = Charindex('<br/>', @HTMLText) 
          SET @End = @Start + 4 
          SET @Length = ( @End - @Start ) + 1 
    
          WHILE ( @Start > 0 
                  AND @End > 0 
                  AND @Length > 0 ) 
            BEGIN 
                SET @HTMLText = Stuff(@HTMLText, @Start, @Length, 
                                'CHAR(13) + CHAR(10)') 
                SET @Start = Charindex('<br/>', @HTMLText) 
                SET @End = @Start + 4 
                SET @Length = ( @End - @Start ) + 1 
            END 
    
          -- Replace any  tags with a newline  
          SET @Start = Charindex('<br />', @HTMLText) 
          SET @End = @Start + 5 
          SET @Length = ( @End - @Start ) + 1 
    
          WHILE ( @Start > 0 
                  AND @End > 0 
                  AND @Length > 0 ) 
            BEGIN 
                SET @HTMLText = Stuff(@HTMLText, @Start, @Length, 
                                'CHAR(13) + CHAR(10)') 
                SET @Start = Charindex('<br />', @HTMLText) 
                SET @End = @Start + 5 
                SET @Length = ( @End - @Start ) + 1 
            END 
    
          -- Remove anything between  tags  
          SET @Start = Charindex('<', @HTMLText) 
          SET @End = Charindex('>', @HTMLText, Charindex('<', @HTMLText)) 
          SET @Length = ( @End - @Start ) + 1 
    
          WHILE ( @Start > 0 
                  AND @End > 0 
                  AND @Length > 0 ) 
            BEGIN 
                IF @stripDisallowedOnly = 1 
                  BEGIN 
                      IF ( Upper(Substring(@HTMLText, @Start, 2)) <> '<B' ) 
                         AND ( Upper(Substring(@HTMLText, @Start, 3)) <> '</B' ) 
                         AND ( Upper(Substring(@HTMLText, @Start, 2)) <> '<U' ) 
                         AND ( Upper(Substring(@HTMLText, @Start, 3)) <> '</U' ) 
                         AND ( Upper(Substring(@HTMLText, @Start, 2)) <> '<I' ) 
                         AND ( Upper(Substring(@HTMLText, @Start, 3)) <> '</I' ) 
                        BEGIN 
                            SET @HTMLText = Stuff(@HTMLText, @Start, @Length, '') 
                        END 
                      ELSE 
                        BEGIN 
                            SET @Length = 0 
                        END 
                  END 
                ELSE 
                  BEGIN 
                      SET @HTMLText = Stuff(@HTMLText, @Start, @Length, '') 
                  END 
    
                SET @Start = Charindex('<', @HTMLText, @End - @Length) 
                SET @End = Charindex('>', @HTMLText, Charindex('<', @HTMLText, 
                                                     @Start) 
                           ) 
                SET @Length = ( @End - @Start ) + 1 
            END 
    
          -- Remove any leading space/carriage return 
          DECLARE @trimchars VARCHAR(10)
          SET @trimchars = CHAR(9)+CHAR(10)+CHAR(13)+CHAR(32)
             IF @HTMLText LIKE '[' + @trimchars + ']%' SET @HTMLText = SUBSTRING(@HTMLText, PATINDEX('%[^' + @trimchars + ']%', @HTMLText), LEN(@HTMLText))
          RETURN Ltrim(Rtrim(@HTMLText)) 
      END
    

    编辑:

    我测试了以下三种方式:

    1. 在一个SQL语句中更新所有内容。

      UPDATE MyEdition SET MyEdition_Copy=[dbo].StripOutHTML(MyEdition_Copy, 1)

    2. - 这种方法需要3个小时

      1. 一批更新了10000条记录。

        DECLARE @min INT, @max INT, @batchSize INT SET @batchSize = 10000 SELECT @min=MIN(MyEdition_id), @max=MAX(MyEdition_id) FROM MyEdition --PRINT 'MAX:' + CAST(@max AS VARCHAR(50)) WHILE @min < @max BEGIN DECLARE @x varchar(max) = '' SET @x = 'UPDATE MyEdition SET MyEdition_CopyText=dbo.StripOutHTML(MyEdition_CopyText, 1)' + 'WHERE (copytextedition_id BETWEEN ' + CAST(@min AS VARCHAR(50)) + ' AND ' + CAST((@min + @batchSize -1) AS VARCHAR(50)) + ');' Exec(@x) PRINT @x SET @min = @min + @batchSize END

      2. - 这种方法耗时5小时20分钟

        1. 将值加载到.Net程序中,剥离这些字符并保存到数据库中。
        2. - 这种方法大约需要24小时。

2 个答案:

答案 0 :(得分:4)

索引仅在您要限制要修改的数据集时才有用。 在您的情况下,索引不会产生任何影响,因为您无论如何都要更新每条记录。实际上,如果索引包含正在更新的字段,则更新甚至可能在没有索引的情况下更快,因为每次更新都会导致索引被更改。

答案 1 :(得分:1)

在这种情况下不一定是答案,但我想发布此内容,以防其他人发现此帖子。以下是批量执行大量更新的另一种方法,它不涉及使用动态SQL和EXEC(),这可能会在一定程度上损害性能:

(注意这种方法需要SQL 2005或更高版本,以及一个带有唯一键的表,在下面的代码中称为&#34; PKColumn&#34;

DECLARE @BatchSize int = 50000;
DECLARE @RowNum int = 1;

WHILE 1=1
BEGIN

WITH cte AS (
  SELECT PKColumn
  FROM MyTable
  WHERE ROW_NUMBER() OVER (ORDER BY PKColumn ASC) BETWEEN @RowNum AND (@RowNum + @BatchSize)
)
UPDATE t1
SET SomeColumn = SomeValue
FROM MyTable t1
INNER JOIN cte
  ON t1.PKColumn=cte.PKColumn;

IF @@ROWCOUNT=0
  BREAK;
ELSE
  BEGIN 
  SET @RowNum = @RowNum + @BatchSize + 1;
  CONTINUE;
  END
END

我希望这一切;我是在没有测试的情况下从内存中快速写的。你可以尝试一下,但我认为,由于你的功能,它不会更快。