我无法将UDF转换为存储过程。
以下是我所拥有的:这是调用该函数的存储过程(我用它来搜索和删除不在32和126之间的所有UNICODE字符):
ALTER PROCEDURE [dbo].[spRemoveUNICODE]
@FieldList varchar(250) = '',
@Multiple int = 0,
@TableName varchar(100) = ''
AS
BEGIN
SET NOCOUNT ON;
DECLARE @SQL VARCHAR(MAX), @counter INT = 0
IF @Multiple > 0
BEGIN
DECLARE @Field VARCHAR(100)
SELECT splitdata
INTO #TempValue
FROM dbo.fnSplitString(@FieldList,',')
WHILE (SELECT COUNT(*) FROM #TempValue) >= 1
BEGIN
DECLARE @Column VARCHAR(100) = (SELECT TOP 1 splitdata FROM #TempValue)
SET @SQL = 'UPDATE ' + @TableName + ' SET ' + @Column + ' = dbo.RemoveNonASCII(' + @Column + ')'
EXEC (@SQL)
--print @SQL
SET @counter = @counter + 1
PRINT @column + ' was checked for ' + @counter + ' rows.'
DELETE FROM #TempValue
WHERE splitdata = @Column
END
END
ELSE IF @Multiple = 0
BEGIN
SET @SQL = 'UPDATE ' + @TableName + ' SET ' + @FieldList + ' = dbo.RemoveNonASCII(' + @FieldList + ')'
EXEC (@SQL)
--print @SQL
SET @counter = @counter + 1
PRINT @column + ' was checked for ' + @counter + ' rows.'
END
END
这是我为帮助更新而创建的UDF(RemoveNonASCII):
ALTER FUNCTION [dbo].[RemoveNonASCII]
(@nstring nvarchar(max))
RETURNS varchar(max)
AS
BEGIN
-- Variables
DECLARE @Result varchar(max) = '',@nchar nvarchar(1), @position int
-- T-SQL statements to compute the return value
set @position = 1
while @position <= LEN(@nstring)
BEGIN
set @nchar = SUBSTRING(@nstring, @position, 1)
if UNICODE(@nchar) between 32 and 127
set @Result = @Result + @nchar
set @position = @position + 1
set @Result = REPLACE(@Result,'))','')
set @Result = REPLACE(@Result,'?','')
END
if (@Result = '')
set @Result = null
-- Return the result
RETURN @Result
END
我一直试图将其转换为存储过程。我想跟踪运行时实际更新的行数。现在它只是说所有行,无论我运行多少,都会更新。我想知道是否只有一半的人有坏人物。已经设置了存储过程,以便它告诉我它正在查看哪一列,我想要包括更新了多少行。这是我到目前为止所尝试的内容:
DECLARE @Result varchar(max) = '',@nchar nvarchar(1), @position int, @nstring nvarchar(max), @counter int = 0, @CountRows int = 0, @Length int
--select Notes from #Temp where Notes is not null order by Notes OFFSET @counter ROWS FETCH NEXT 1 ROWS ONLY
set @nstring = (select Notes from #Temp where Notes is not null order by Notes OFFSET @counter ROWS FETCH NEXT 1 ROWS ONLY)
set @Length = LEN(@nstring)
if @Length = 0 set @Length = 1
-- Add the T-SQL statements to compute the return value here
set @position = 1
while @position <= @Length
BEGIN
print @counter
print @CountRows
select @nstring
set @nchar = SUBSTRING(@nstring, @position, 1)
if UNICODE(@nchar) between 32 and 127
begin
print unicode(@nchar)
set @Result = @Result + @nchar
set @counter = @counter + 1
end
if UNICODE(@nchar) not between 32 and 127
begin
set @CountRows = @CountRows + 1
end
set @position = @position + 1
END
print 'Rows found with invalid UNICODE: ' + convert(varchar,@CountRows)
现在我故意创建一个临时表并添加一堆笔记,然后添加一堆无效字符。
我创建了一个包含700多个Notes的列表,然后使用一些无效字符(在32 - 127之外)更新了其中的两个。有一些是null,一些不是null,但它们中没有任何东西。会发生什么是我得到0更新。
找到无效UNICODE的行:0
虽然它确实看到它所引用的UNICODE是32。
显然我错过了一些我不知道它是什么的东西。
答案 0 :(得分:2)
以下是基于设置的解决方案,用于处理批量替换。这不是使用缓慢的标量函数,而是使用内联表值函数。这些比他们的标量祖先要快得多。我在这里使用计数表。我把它作为我的系统的视图就像这样。
create View [dbo].[cteTally] as
WITH
E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select N from cteTally
如果您对计数表感兴趣,这里有一篇关于该主题的优秀文章。 http://www.sqlservercentral.com/articles/T-SQL/62867/
create function RemoveNonASCII
(
@SearchVal nvarchar(max)
) returns table as
RETURN
with MyValues as
(
select substring(@SearchVal, N, 1) as MyChar
, t.N
from cteTally t
where N <= len(@SearchVal)
and UNICODE(substring(@SearchVal, N, 1)) between 32 and 127
)
select distinct MyResult = STUFF((select MyChar + ''
from MyValues mv2
order by mv2.N
--for xml path('')), 1, 0, '')
FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)'), 1, 0, '')
from MyValues mv
;
现在,您可以使用交叉申请,而不是被迫每一行调用它。原始问题的这一部分的性能优势应该非常大。
我也说过你的字符串拆分器也是一个潜在的性能问题。这是一篇很棒的文章,里面有很多基于快速设置的字符串分割器。 http://sqlperformance.com/2012/07/t-sql-queries/split-strings
这里的最后一步是消除程序中的第一个循环。这也可以做,但我不完全确定你的代码在那里做什么。我会仔细观察,看看能找到什么。在此期间,您可以通过此解析并随意提出有关您不理解的任何部分的问题。
答案 1 :(得分:0)
以下是我在Sean Lange的帮助下开展的工作:
我如何调用存储过程:
exec spRemoveUNICODE @FieldList='Notes,Notes2,Notes3,Notes4,Notes5',@Multiple=1,@TableName='#Temp'
创建了#Temp
表:
create table #Temp (ID int,Notes nvarchar(Max),Notes2 nvarchar(max),Notes3 nvarchar(max),Notes4 nvarchar(max),Notes5 nvarchar(max))
然后我用来自几个不同表格的5个字段的注释填充它,其长度范围从NULL
到空白(但不是空)到5000个字符。
update #Temp
set Notes2 = SUBSTRING(Notes2,1,LEN(Notes2)/2) + N'㹊潮Ņࢹᖈư㹨ƶ槹鎤⻄ƺ綐ڌ⸀ƺ삸)䀤ƍ샄)Ņᛡ鎤ꗘᖃᒨ쬵Ğᘍ鎤ᐜᏰ>֔υ赸Ƹ쳰డ촜)鉀촜)쮜)Ἡ屰山舰霡ࣆ 耏Аం畠Ư놐ᓜતᏛ֔Ꮫ֨Ꮫᓜƒ 邰厰ఆ邰드)抉鎤듄)繟Ĺ띨)ࢹ䮸ࣉࢹ䮸ࣉ샰)ԌƏŅᕄ홑Ņᛙ鎤ꗘᖃᒨࢹ' + SUBSTRING(Notes2,LEN(Notes2)/2-1,LEN(Notes2)/2)
我为5列中的每一列都这样做。
以下是spRemoveUNICODE
现在的样子:
ALTER PROCEDURE [dbo].[spRemoveUNICODE]
-- Parameters
@FieldList varchar(250) = '',
@Multiple int = 0,
@TableName varchar(100) = ''
AS
BEGIN
SET NOCOUNT ON;
-- Variables
declare @SQL varchar(max)
-- Insert statements for procedure here
if @Multiple > 0
BEGIN
declare @Field varchar(100)
select Item into #TempValue from dbo.SplitStrings_Numbers(@FieldList,',')
while (select count(*) from #TempValue) >= 1
BEGIN
declare @Column varchar(100) = (select top 1 Item from #TempValue)
set @SQL = 'UPDATE ' + @TableName + ' SET ' + @Column + ' = tt.Result
from ' + @TableName + ' t
join (select ID,(select REPLACE(REPLACE(REPLACE(REPLACE(MyResult,''))'',''''),''>)'',''''),'' N>) N'',''''),'' N N'','''')
from dbo.RemoveNonASCII_New(' + @Column + ')) Result from ' + @TableName + ') tt on t.ID = tt.ID'
exec (@SQL)
--print @SQL --for trouble shooting
print @column + ' was checked.'
delete from #TempValue
from #TempValue
where Item = @Column
END
END
else if @Multiple = 0
BEGIN
set @SQL = 'UPDATE ' + @TableName + ' SET ' + @FieldList + ' = tt.Result
from ' + @TableName + ' t
join (select ID,(select REPLACE(REPLACE(REPLACE(REPLACE(MyResult,''))'',''''),''>)'',''''),'' N>) N'',''''),'' N N'','''')
from dbo.RemoveNonASCII_New(' + @FieldList + ')) Result from ' + @TableName + ') tt on t.ID = tt.ID'
exec (@SQL)
--print @SQL --for trouble shooting
print @column + ' was checked.'
END
END
以下是新的SplitStrings_Numbers
函数,它将列列表拆分为各个列名:
ALTER FUNCTION [dbo].[SplitStrings_Numbers]
(
@List NVARCHAR(MAX),
@Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT Item = SUBSTRING(@List, Number,
CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)
FROM dbo.Numbers
WHERE Number <= CONVERT(INT, LEN(@List))
AND SUBSTRING(@Delimiter + @List, Number, LEN(@Delimiter)) = @Delimiter
);
我创建了Numbers
表,如下所示:
DECLARE @UpperLimit INT = 1000000;
WITH n AS
(
SELECT
x = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
CROSS JOIN sys.all_objects AS s3
)
SELECT Number = x
INTO dbo.Numbers
FROM n
WHERE x BETWEEN 1 AND @UpperLimit;
GO
CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers(Number)
WITH (DATA_COMPRESSION = PAGE);
GO
然后最后搜索Notes并删除无效的UNICODE,就像使用RemoveNonASSCII_New
函数一样:
ALTER function [dbo].[RemoveNonASCII_New]
(
@SearchVal nvarchar(max)
) returns table as
RETURN
with MyValues as
(
select substring(@SearchVal, Number, 1) as MyChar
, t.Number
from Numbers t
where Number <= len(@SearchVal)
and UNICODE(substring(@SearchVal, Number, 1)) between 32 and 127
)
select distinct MyResult = STUFF((select MyChar + ''
from MyValues mv2
order by mv2.Number
FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)'), 1, 0, '')
from MyValues mv;
我在原始问题中这样做的方式花费了60多分钟来清除所有5列。使用这种新方法,清除相同的5列需要1.5分钟。每列中有超过11000行添加了无效字符。