我需要将1.5百万条记录插入到具有以下布局的表中:
MyEdition
(
MyEdition_ID INT,
MyEntity_ID INT,
copyType_ID INT,
MyEdition_CopyText VARCHAR(MAX)
)
在从临时表中插入记录之前,我正在禁用表上的所有INDEXES(包括聚簇和非聚簇)和TRIGGERS。
我需要从所有1.5M行的MyEdition_CopyText
列中删除所有HTML标记(Bold,Italic和Underline标记除外),因此我在表格中执行UPDATE
它使用SQL函数去除MyEdition_CopyText
列中的HTML标记。
总结一下我在做什么:
ALTER INDEX [MyEdition_reindex] ON [MyEdition] DISABLE
(与所有其他索引类似)ALTER TABLE [dbo].[MyEdition] DISABLE TRIGGER ALL;
INSERT INTO MyEdition(<columns>) SELECT <columns> FROM <#my_temp_table>
UPDATE MyEdition SET MyEdition_CopyText = dbo.StripHTML(MyEdition_CopyText)
REBULID INDEXES
ENABLE TRIGGERS
UPDATE
语句需要一段时间,所以我想知道这是因为所有索引都被禁用了吗?从列中删除HTML字符的最佳方法是什么?
有关如何改进的建议吗? SQL Server如何在执行UPDATE
操作时使用索引?
我正在使用以下SQL函数:
ALTER FUNCTION [dbo].[StripOutHTML]
(
@HTMLText VARCHAR(max),
@stripDisallowedOnly BIT
)
returns VARCHAR(max)
AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
-- Replace the HTML entity & with the '&' character (this needs to be done first, as
-- '&' might be double encoded as '&')
SET @Start = Charindex('&', @HTMLText)
SET @End = @Start + 4
SET @Length = ( @End - @Start ) + 1
WHILE ( @Start > 0
AND @End > 0
AND @Length > 0 )
BEGIN
SET @HTMLText = Stuff(@HTMLText, @Start, @Length, '&')
SET @Start = Charindex('&', @HTMLText)
SET @End = @Start + 4
SET @Length = ( @End - @Start ) + 1
END
-- Replace the HTML entity < with the '<' character
SET @Start = Charindex('<', @HTMLText)
SET @End = @Start + 3
SET @Length = ( @End - @Start ) + 1
WHILE ( @Start > 0
AND @End > 0
AND @Length > 0 )
BEGIN
SET @HTMLText = Stuff(@HTMLText, @Start, @Length, '<')
SET @Start = Charindex('<', @HTMLText)
SET @End = @Start + 3
SET @Length = ( @End - @Start ) + 1
END
-- Replace the HTML entity > with the '>' character
SET @Start = Charindex('>', @HTMLText)
SET @End = @Start + 3
SET @Length = ( @End - @Start ) + 1
WHILE ( @Start > 0
AND @End > 0
AND @Length > 0 )
BEGIN
SET @HTMLText = Stuff(@HTMLText, @Start, @Length, '>')
SET @Start = Charindex('>', @HTMLText)
SET @End = @Start + 3
SET @Length = ( @End - @Start ) + 1
END
-- Replace the HTML entity & with the '&' character
SET @Start = Charindex('&amp;', @HTMLText)
SET @End = @Start + 4
SET @Length = ( @End - @Start ) + 1
WHILE ( @Start > 0
AND @End > 0
AND @Length > 0 )
BEGIN
SET @HTMLText = Stuff(@HTMLText, @Start, @Length, '&')
SET @Start = Charindex('&amp;', @HTMLText)
SET @End = @Start + 4
SET @Length = ( @End - @Start ) + 1
END
-- Replace the HTML entity with the ' ' character
SET @Start = Charindex(' ', @HTMLText)
SET @End = @Start + 5
SET @Length = ( @End - @Start ) + 1
WHILE ( @Start > 0
AND @End > 0
AND @Length > 0 )
BEGIN
SET @HTMLText = Stuff(@HTMLText, @Start, @Length, ' ')
SET @Start = Charindex(' ', @HTMLText)
SET @End = @Start + 5
SET @Length = ( @End - @Start ) + 1
END
-- Replace any <P>, </P>tags with a <BR>, so they will be replaced with a new line in next step
SET @HTMLText = REPLACE(@HTMLText, '<P>', '<br>')
SET @HTMLText = REPLACE(@HTMLText, '</P>', '<br>')
-- Replace any <BR> tags with a newline
SET @Start = Charindex('<br>', @HTMLText)
SET @End = @Start + 3
SET @Length = ( @End - @Start ) + 1
WHILE ( @Start > 0
AND @End > 0
AND @Length > 0 )
BEGIN
SET @HTMLText = Stuff(@HTMLText, @Start, @Length,
Char(13) + Char(10))
SET @Start = Charindex('<br>', @HTMLText)
SET @End = @Start + 3
SET @Length = ( @End - @Start ) + 1
END
-- Replace any tags with a newline
SET @Start = Charindex('<br/>', @HTMLText)
SET @End = @Start + 4
SET @Length = ( @End - @Start ) + 1
WHILE ( @Start > 0
AND @End > 0
AND @Length > 0 )
BEGIN
SET @HTMLText = Stuff(@HTMLText, @Start, @Length,
'CHAR(13) + CHAR(10)')
SET @Start = Charindex('<br/>', @HTMLText)
SET @End = @Start + 4
SET @Length = ( @End - @Start ) + 1
END
-- Replace any tags with a newline
SET @Start = Charindex('<br />', @HTMLText)
SET @End = @Start + 5
SET @Length = ( @End - @Start ) + 1
WHILE ( @Start > 0
AND @End > 0
AND @Length > 0 )
BEGIN
SET @HTMLText = Stuff(@HTMLText, @Start, @Length,
'CHAR(13) + CHAR(10)')
SET @Start = Charindex('<br />', @HTMLText)
SET @End = @Start + 5
SET @Length = ( @End - @Start ) + 1
END
-- Remove anything between tags
SET @Start = Charindex('<', @HTMLText)
SET @End = Charindex('>', @HTMLText, Charindex('<', @HTMLText))
SET @Length = ( @End - @Start ) + 1
WHILE ( @Start > 0
AND @End > 0
AND @Length > 0 )
BEGIN
IF @stripDisallowedOnly = 1
BEGIN
IF ( Upper(Substring(@HTMLText, @Start, 2)) <> '<B' )
AND ( Upper(Substring(@HTMLText, @Start, 3)) <> '</B' )
AND ( Upper(Substring(@HTMLText, @Start, 2)) <> '<U' )
AND ( Upper(Substring(@HTMLText, @Start, 3)) <> '</U' )
AND ( Upper(Substring(@HTMLText, @Start, 2)) <> '<I' )
AND ( Upper(Substring(@HTMLText, @Start, 3)) <> '</I' )
BEGIN
SET @HTMLText = Stuff(@HTMLText, @Start, @Length, '')
END
ELSE
BEGIN
SET @Length = 0
END
END
ELSE
BEGIN
SET @HTMLText = Stuff(@HTMLText, @Start, @Length, '')
END
SET @Start = Charindex('<', @HTMLText, @End - @Length)
SET @End = Charindex('>', @HTMLText, Charindex('<', @HTMLText,
@Start)
)
SET @Length = ( @End - @Start ) + 1
END
-- Remove any leading space/carriage return
DECLARE @trimchars VARCHAR(10)
SET @trimchars = CHAR(9)+CHAR(10)+CHAR(13)+CHAR(32)
IF @HTMLText LIKE '[' + @trimchars + ']%' SET @HTMLText = SUBSTRING(@HTMLText, PATINDEX('%[^' + @trimchars + ']%', @HTMLText), LEN(@HTMLText))
RETURN Ltrim(Rtrim(@HTMLText))
END
编辑:
我测试了以下三种方式:
在一个SQL语句中更新所有内容。
UPDATE MyEdition SET MyEdition_Copy=[dbo].StripOutHTML(MyEdition_Copy, 1)
- 这种方法需要3个小时
一批更新了10000条记录。
DECLARE @min INT, @max INT, @batchSize INT
SET @batchSize = 10000
SELECT @min=MIN(MyEdition_id), @max=MAX(MyEdition_id)
FROM MyEdition
--PRINT 'MAX:' + CAST(@max AS VARCHAR(50))
WHILE @min < @max
BEGIN
DECLARE @x varchar(max) = ''
SET @x = 'UPDATE MyEdition SET MyEdition_CopyText=dbo.StripOutHTML(MyEdition_CopyText, 1)'
+ 'WHERE (copytextedition_id BETWEEN ' + CAST(@min AS VARCHAR(50)) + ' AND ' + CAST((@min + @batchSize -1) AS VARCHAR(50)) + ');'
Exec(@x)
PRINT @x
SET @min = @min + @batchSize
END
- 这种方法耗时5小时20分钟
- 这种方法大约需要24小时。
答案 0 :(得分:4)
索引仅在您要限制要修改的数据集时才有用。 在您的情况下,索引不会产生任何影响,因为您无论如何都要更新每条记录。实际上,如果索引包含正在更新的字段,则更新甚至可能在没有索引的情况下更快,因为每次更新都会导致索引被更改。
答案 1 :(得分:1)
在这种情况下不一定是答案,但我想发布此内容,以防其他人发现此帖子。以下是批量执行大量更新的另一种方法,它不涉及使用动态SQL和EXEC()
,这可能会在一定程度上损害性能:
(注意这种方法需要SQL 2005或更高版本,以及一个带有唯一键的表,在下面的代码中称为&#34; PKColumn
&#34;
DECLARE @BatchSize int = 50000;
DECLARE @RowNum int = 1;
WHILE 1=1
BEGIN
WITH cte AS (
SELECT PKColumn
FROM MyTable
WHERE ROW_NUMBER() OVER (ORDER BY PKColumn ASC) BETWEEN @RowNum AND (@RowNum + @BatchSize)
)
UPDATE t1
SET SomeColumn = SomeValue
FROM MyTable t1
INNER JOIN cte
ON t1.PKColumn=cte.PKColumn;
IF @@ROWCOUNT=0
BREAK;
ELSE
BEGIN
SET @RowNum = @RowNum + @BatchSize + 1;
CONTINUE;
END
END
我希望这一切;我是在没有测试的情况下从内存中快速写的。你可以尝试一下,但我认为,由于你的功能,它不会更快。