有效地从字符串中替换许多字符

时间:2014-12-23 18:18:49

标签: sql sql-server

我想知道来自efficientremoving occurrence , ; / "varchar字符CREATE FUNCTION [dbo].[Udf_getcleanedstring] (@s VARCHAR(255)) returns VARCHAR(255) AS BEGIN DECLARE @o VARCHAR(255) SET @o = Replace(@s, '/', '') SET @o = Replace(@o, '-', '') SET @o = Replace(@o, ';', '') SET @o = Replace(@o, '"', '') RETURN @o END {{1}}的最{{1}}种方式。

我有这样的功能,但速度非常慢。该表有 2000万条记录

{{1}}

4 个答案:

答案 0 :(得分:3)

无论您使用哪种方法,都可能值得添加

WHERE YourCol LIKE '%[/-;"]%'

除非您怀疑很大一部分行实际上至少包含一个需要剥离的字符。

当您在UPDATE语句中使用它时,只需添加WITH SCHEMABINDING属性即可大量改进并允许UPDATE逐行进行,而不是将整个操作缓存在假脱机中首先是Halloween Protection

enter image description here

TSQL中的嵌套REPLACE调用无论如何都很慢,因为它们涉及多次遍历字符串。

您可以按照以下方式敲除CLR功能(如果您之前没有使用它们,那么只要在服务器上允许CLR执行,它们就很容易从SSDT项目部署)。此更新计划也不包含假脱机。

正则表达式使用(?:)来表示非捕获组,其中感兴趣的各种字符由交替字符|分隔为/|-|;|\""需要为在字符串文字中转义,因此前面有斜杠)。

using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;

public partial class UserDefinedFunctions
{
    private static readonly Regex regexStrip = 
                        new Regex("(?:/|-|;|\")", RegexOptions.Compiled);

    [SqlFunction]
    public static SqlString StripChars(SqlString Input)
    {
        return Input.IsNull ?  null : regexStrip.Replace((string)Input, "");        
    }
}

答案 1 :(得分:2)

我想展示使用2种类型的USER DIFINED FUNCTIONS之间的巨大性能差异:

  1. 用户表功能
  2. 用户SCALAR功能
  3. 参见测试示例:

    use AdventureWorks2012
    go
    
    -- create table for the test
    create table dbo.FindString (ColA int identity(1,1) not null primary key,ColB varchar(max) );
    
    declare @text varchar(max) =  'A web server can handle a Hypertext Transfer Protocol request either by reading 
    a file from its file ; system based on the URL <> path or by handling the request using logic that is specific 
    to the type of resource. In the case that special logic is invoked the query string will be available to that logic 
    for use in its processing, along with the path component of the URL.';
    
    -- init process in loop 1,000,000 
    insert into dbo.FindString(ColB)
    select @text 
    go 1000000
    
    -- use one of the scalar function from the answers which post in this thread
    alter function [dbo].[udf_getCleanedString]
    ( 
    @s varchar(max)
    )
    returns  varchar(max)
    as
    begin
    return replace(replace(replace(replace(@s,'/',''),'-',''),';',''),'"','')
    end
    go
    --
    -- create from the function above new function an a table function ;
    create function [dbo].[utf_getCleanedString]
    ( 
    @s varchar(255)
    )
    returns  table 
    as return
    (
    select  replace(replace(replace(replace(@s,'/',''),'-',''),';',''),'"','') as String
    )
    go
    
    --
    -- clearing the buffer cach
    DBCC DROPCLEANBUFFERS ;
    go
    -- update process using USER TABLE FUNCTIO
    update Dest with(rowlock) set
    dest.ColB  = D.String
    from dbo.FindString dest
    cross apply utf_getCleanedString(dest.ColB) as D
    go
    
    DBCC DROPCLEANBUFFERS ;
    go
    -- update process using USER SCALAR FUNCTION
    update Dest with(rowlock) set
    dest.ColB  =  dbo.udf_getCleanedString(dest.ColB) 
    from dbo.FindString dest
    go
    

    这些是执行计划: 正如你所看到的那样,UTF在USF上要好得多,他们2做同样的事情来替换字符串,但是一个返回标量而另一个返回表格

    As you can see the UTF is much better the USF

    您可以看到的另一个重要参数(SET STATISTICS IO ON;)

    SET STATISTICS IO ON

答案 2 :(得分:0)

如何在一次通话中将它们嵌套在一起:

 create function [dbo].[udf_getCleanedString]
 ( 
    @s varchar(255)
 )
 returns varchar(255)
 as
 begin
   return replace(replace(replace(replace(@s,'/',''),'-',''),';',''),'"','')
 end

或者你可能想第一次在桌子上做UPDATE。标量函数非常慢。

答案 3 :(得分:0)

以前是一个类似的问题,我喜欢这里提到的方法。

How to Replace Multiple Characters in SQL?

declare @badStrings table (item varchar(50))

INSERT INTO @badStrings(item)
SELECT '>' UNION ALL
SELECT '<' UNION ALL
SELECT '(' UNION ALL
SELECT ')' UNION ALL
SELECT '!' UNION ALL
SELECT '?' UNION ALL
SELECT '@'

declare @testString varchar(100), @newString varchar(100)

set @teststring = 'Juliet ro><0zs my s0x()rz!!?!one!@!@!@!'
set @newString = @testString

SELECT @newString = Replace(@newString, item, '') FROM @badStrings

select @newString -- returns 'Juliet ro0zs my s0xrzone'