删除重复的重复字符

时间:2011-04-26 17:33:21

标签: sql tsql sql-server-2008

我的存储过程中有一个字符串,如',,,sam,,bob,'',,,' 从上面的字符串我必须从中删除多个逗号,它必须看起来像 'sam,bob,'或仅',,,'然后''。 我必须只使用Sql Server Functions。 我使用的是Sql Server 2008和.Net 3.5

提前致谢。

6 个答案:

答案 0 :(得分:8)

这适用于仅使用逗号或最多包含398个连续逗号的字符串。

 SELECT 
     CASE 
         WHEN TargetString NOT LIKE '%[^,]%' 
             THEN '' /*The string is exclusively commas*/
         ELSE 
            REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TargetString,
            REPLICATE(',',16),','), /*399/16 = 24 remainder 15*/
            REPLICATE(',',8),','),  /* 39/ 8 =  4 remainder 7*/
            REPLICATE(',',4),','),  /* 11/ 4 =  2 remainder 3*/
            REPLICATE(',',2),','),  /*  5/ 2 =  2 remainder 1*/
            REPLICATE(',',2),',')   /*  3/ 2 =  1 remainder 1*/
         END
 FROM T    

如果您需要更多,请在顶部添加2的额外功率,或者如果您需要更少,则从顶部删除。每个阶段的评论表明这个阶段不能成功处理的最小数量。

所有评论行都采用此格式

/*  L/D    =  Q remainder R */

D:    Corresponds to the length of the string generated by `REPLICATE`
R:    Is always D-1
Q+R:  Form L for the next step

所以要用另一个REPLICATE(',',32),',')阶段向上扩展系列

D = 32 
R = 31
Q = 368 (399-31)
L = (368 * 32) + 31 = 11807

这样可以处理最多11,806个字符的逗号部分。

答案 1 :(得分:6)

我建议UDF这样做。由于我即将建议的UDF不会触及任何表格,因此性能应该非常好。

CREATE Function [dbo].[CleanDuplicates](@Data VarChar(8000), @DuplicateChar VarChar(1))
Returns VarChar(8000)
WITH SCHEMABINDING
AS
Begin

    Set @Data = @DuplicateChar + @Data

    While PATINDEX('%' + @DuplicateChar + @DuplicateChar + '%',@Data) > 0
        Set @Data = REPLACE(@Data, @DuplicateChar + @DuplicateChar,@DuplicateChar)

    Return Right(@Data, Len(@Data)-1)

End

您可以像这样测试功能:

Select dbo.CleanDuplicates(',,,', ',')
Select dbo.CleanDuplicates(',,,sam,,bob,', ',')

答案 2 :(得分:2)

试试这个

SELECT @Parameter AS 'BEFORE'
BEGIN
WHILE CHARINDEX(',,', @Parameter) > 0
    BEGIN
        SELECT @Parameter = REPLACE(@Parameter, ',,',',') 
    END
SELECT @Parameter AS 'AFTER'
END

答案 3 :(得分:1)

  

George Mastros写道:

     
     

我建议UDF这样做。自从UDF我即将建议   不接触任何表格,表现应该不错。

我同意“仅限内存”Scalar UDF非常快。实际上,我实际上使用了George的Scalar UDF之一来解决“初始上限”问题,以证明有时“基于设置”代码总是最好的方法。

然而,马丁史密斯(在这个线程上的另一张海报)肯定是在正确的轨道上。在这种情况下,“基于设置”仍然是要走的路。当然,任何人都可以对性能做出未经证实的声明,所以让我们通过性能演示来加热它。

为了演示,我们首先需要一些测试数据。很多测试数据,因为我们要测试的两个函数都快速运行。这是构建百万行测试表的代码。

--===== Conditionally drop the test table 
     -- to make reruns in SSMS easier
     IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL
        DROP TABLE #MyHead
GO
--===== Create and populate the test table on-the-fly.
     -- This builds a bunch of GUIDs and removes the dashes from them to 
     -- increase the chances of duplicating adjacent characters.
     -- Not to worry.  This takes less than 7 seconds to run because of
     -- the "Pseudo Cursor" created by the CROSS JOIN.
 SELECT TOP 1000000
        RowNum     = IDENTITY(INT,1,1),
        SomeString = REPLACE(CAST(NEWID() AS VARCHAR(36)),'-','')
   INTO #MyHead
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;
GO

不需要在这里重新发布乔治的优良功能,但我确实需要发布我的。以下函数产生与George相同的结果。它看起来像一个“iTVF”(内联表值函数),它只返回一个值。这就是为什么微软称它们为“内联标量函数”(我称之为“iSFs”)。

 CREATE FUNCTION dbo.CleanDuplicatesJBM
        (@Data VARCHAR(8000), @DuplicateChar VARCHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN 
 SELECT Item =  STUFF(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
                    @DuplicateChar+@Data COLLATE LATIN1_GENERAL_BIN,
                REPLICATE(@DuplicateChar,33),@DuplicateChar),
                REPLICATE(@DuplicateChar,17),@DuplicateChar),
                REPLICATE(@DuplicateChar, 9),@DuplicateChar),
                REPLICATE(@DuplicateChar, 5),@DuplicateChar),
                REPLICATE(@DuplicateChar, 3),@DuplicateChar),
                REPLICATE(@DuplicateChar, 2),@DuplicateChar),
                REPLICATE(@DuplicateChar, 2),@DuplicateChar)
                ,1,1,'')
;
GO

首先,让我们测试一下George的Scalar UDF。请阅读有关我们未在此处使用SET STATISTICS TIME的原因的评论。

/******************************************************************************
 Test George's code.
 Since Scalar Functions don't work well with SET STATISTICS TIME ON, we measure
 duration a different way.  We'll also throw away the result in a "Bit Bucket"
 variable because we're trying to measure the performance of the function 
 rather than how long it takes to display or store results.
******************************************************************************/
--===== Declare some obviously named variables
DECLARE @StartTime DATETIME,
        @BitBucket VARCHAR(36)
;
--===== Start the "Timer"
 SELECT @StartTime = GETDATE()
;
--===== Run the test on the function
 SELECT @BitBucket = [dbo].[CleanDuplicates](SomeString,'A')
   FROM #MyHead
;
--===== Display the duration in milliseconds
  PRINT DATEDIFF(ms,@StartTime,GETDATE())
;
--===== Run the test a total of 5 times
GO 5

以下是“fiver”运行的回报......

Beginning execution loop
15750
15516
15543
15480
15510
Batch execution completed 5 times.
(Average is 15,559 on my 10 year old, single 1.8Ghz CPU)

现在,我们将运行“iSF”版本......

/******************************************************************************
 Test Jeff's code.
 Even though this uses an "iSF" (Inline Scalar Function), we'll test exactly
 the same way that we tested George's code so we're comparing apples-to-apples.
 This includes throwing away the result in a "Bit Bucket" variable because 
 we're trying to measure the performance of the function rather than how long 
 it takes to display or store results.
******************************************************************************/
--===== Declare some obviously named variables
DECLARE @StartTime DATETIME,
        @BitBucket VARCHAR(36)
;
--===== Start the "Timer"
 SELECT @StartTime = GETDATE()
;
--===== Run the test on the function
 SELECT @BitBucket = cleaned.ITEM
   FROM #MyHead
  CROSS APPLY [dbo].[CleanDuplicatesJBM](SomeString,'A') cleaned
;
--===== Display the duration in milliseconds
  PRINT DATEDIFF(ms,@StartTime,GETDATE())
;
--===== Run the test a total of 5 times
GO 5

以下是该运行的结果。

Beginning execution loop
6856
6810
7020
7350
6996
Batch execution completed 5 times.
(Average is 7,006 {more than twice as fast} on my 10 year old, single 1.8Ghz CPU)

我的观点并非乔治的代码不好。一点也不。实际上,当没有“单一查询”解决方案时,我使用标量UDF。我还要声明并不是所有的“单一查询”解决方案都是最好的,而且还要说明乔治。

就UDF而言,不要停止寻找它们。 ; - )

答案 4 :(得分:0)

您的解决方案很好,但

  1. 仅限逗号
  2. 我讨厌基于循环的TSQL代码; - )
  3. 所以我写了基于Marcin解决方案集的通用代码来替换每个声明的重复项:

    DECLARE @Duplicate NVARCHAR(100)= '#$'
    DECLARE @TestString NVARCHAR(MAX)= 'test_test__f##f2$$g'
    DECLARE @Replacement NVARCHAR(MAX)= ''
    DECLARE @OutputString NVARCHAR(MAX)= @teststring ;
    WITH    numbers
              AS ( SELECT   ROW_NUMBER() OVER ( ORDER BY o.object_id, o2.object_id ) Number
                   FROM     sys.objects o
                            CROSS JOIN sys.objects o2
                 ),
            chars
              AS ( SELECT   SUBSTRING(@Duplicate, 1, 1) CHAR ,
                            CAST(1 AS INT) [LEVEL]
                   UNION ALL
                   SELECT   SUBSTRING(@Duplicate, numbers.Number, 1) CHAR ,
                            CAST(numbers.Number AS INT) [LEVEL]
                   FROM     numbers
                            JOIN chars ON chars.Level + 1 = numbers.Number
                   WHERE    LEN(SUBSTRING(@Duplicate, numbers.Number, 1)) > 0
                 ),
            Replicated
              AS ( SELECT   REPLICATE(CHAR, numbers.number) Repl ,
                            numbers.Number
                   FROM     chars
                            CROSS JOIN numbers
                 )
        SELECT  @OutputString = REPLACE(@OutputString, Repl, @Replacement)
        FROM    replicated
        WHERE   number <= LEN(@TestString)
    
    SELECT  @OutputString
    

    您可以在Duplicate字符串和@Replacement中的每个替换字符串中声明每种类型的char。 额外增益IMHO是我只在输入字符串的最大长度范围内搜索替换

答案 5 :(得分:0)

你可以试试

SELECT REPLACE(LTRIM(REPLACE(',,,sam,,bob,', ',', ' ')),' ', ',')