Question

我的存储过程中有一个字符串，如',,,sam,,bob,'或',,,' 从上面的字符串我必须从中删除多个逗号，它必须看起来像 'sam,bob,'或仅',,,'然后''。我必须只使用Sql Server Functions。我使用的是Sql Server 2008和.Net 3.5

提前致谢。

Answer 1

这适用于仅使用逗号或最多包含398个连续逗号的字符串。

 SELECT 
     CASE 
         WHEN TargetString NOT LIKE '%[^,]%' 
             THEN '' /*The string is exclusively commas*/
         ELSE 
            REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TargetString,
            REPLICATE(',',16),','), /*399/16 = 24 remainder 15*/
            REPLICATE(',',8),','),  /* 39/ 8 =  4 remainder 7*/
            REPLICATE(',',4),','),  /* 11/ 4 =  2 remainder 3*/
            REPLICATE(',',2),','),  /*  5/ 2 =  2 remainder 1*/
            REPLICATE(',',2),',')   /*  3/ 2 =  1 remainder 1*/
         END
 FROM T

如果您需要更多，请在顶部添加2的额外功率，或者如果您需要更少，则从顶部删除。每个阶段的评论表明这个阶段不能成功处理的最小数量。

所有评论行都采用此格式

/*  L/D    =  Q remainder R */

D:    Corresponds to the length of the string generated by `REPLICATE`
R:    Is always D-1
Q+R:  Form L for the next step

所以要用另一个REPLICATE(',',32),',')阶段向上扩展系列

D = 32 
R = 31
Q = 368 (399-31)
L = (368 * 32) + 31 = 11807

这样可以处理最多11,806个字符的逗号部分。

Answer 2

我建议UDF这样做。由于我即将建议的UDF不会触及任何表格，因此性能应该非常好。

CREATE Function [dbo].[CleanDuplicates](@Data VarChar(8000), @DuplicateChar VarChar(1))
Returns VarChar(8000)
WITH SCHEMABINDING
AS
Begin

    Set @Data = @DuplicateChar + @Data

    While PATINDEX('%' + @DuplicateChar + @DuplicateChar + '%',@Data) > 0
        Set @Data = REPLACE(@Data, @DuplicateChar + @DuplicateChar,@DuplicateChar)

    Return Right(@Data, Len(@Data)-1)

End

您可以像这样测试功能：

Select dbo.CleanDuplicates(',,,', ',')
Select dbo.CleanDuplicates(',,,sam,,bob,', ',')

Answer 3

试试这个

SELECT @Parameter AS 'BEFORE'
BEGIN
WHILE CHARINDEX(',,', @Parameter) > 0
    BEGIN
        SELECT @Parameter = REPLACE(@Parameter, ',,',',') 
    END
SELECT @Parameter AS 'AFTER'
END

Answer 4

George Mastros写道：


我建议UDF这样做。自从UDF我即将建议   不接触任何表格，表现应该不错。

我同意“仅限内存”Scalar UDF非常快。实际上，我实际上使用了George的Scalar UDF之一来解决“初始上限”问题，以证明有时“基于设置”代码不总是最好的方法。

然而，马丁史密斯（在这个线程上的另一张海报）肯定是在正确的轨道上。在这种情况下，“基于设置”仍然是要走的路。当然，任何人都可以对性能做出未经证实的声明，所以让我们通过性能演示来加热它。

为了演示，我们首先需要一些测试数据。很多测试数据，因为我们要测试的两个函数都快速运行。这是构建百万行测试表的代码。

--===== Conditionally drop the test table 
     -- to make reruns in SSMS easier
     IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL
        DROP TABLE #MyHead
GO
--===== Create and populate the test table on-the-fly.
     -- This builds a bunch of GUIDs and removes the dashes from them to 
     -- increase the chances of duplicating adjacent characters.
     -- Not to worry.  This takes less than 7 seconds to run because of
     -- the "Pseudo Cursor" created by the CROSS JOIN.
 SELECT TOP 1000000
        RowNum     = IDENTITY(INT,1,1),
        SomeString = REPLACE(CAST(NEWID() AS VARCHAR(36)),'-','')
   INTO #MyHead
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;
GO

不需要在这里重新发布乔治的优良功能，但我确实需要发布我的。以下函数产生与George相同的结果。它看起来像一个“iTVF”（内联表值函数），它只返回一个值。这就是为什么微软称它们为“内联标量函数”（我称之为“iSFs”）。

 CREATE FUNCTION dbo.CleanDuplicatesJBM
        (@Data VARCHAR(8000), @DuplicateChar VARCHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN 
 SELECT Item =  STUFF(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
                    @DuplicateChar+@Data COLLATE LATIN1_GENERAL_BIN,
                REPLICATE(@DuplicateChar,33),@DuplicateChar),
                REPLICATE(@DuplicateChar,17),@DuplicateChar),
                REPLICATE(@DuplicateChar, 9),@DuplicateChar),
                REPLICATE(@DuplicateChar, 5),@DuplicateChar),
                REPLICATE(@DuplicateChar, 3),@DuplicateChar),
                REPLICATE(@DuplicateChar, 2),@DuplicateChar),
                REPLICATE(@DuplicateChar, 2),@DuplicateChar)
                ,1,1,'')
;
GO

首先，让我们测试一下George的Scalar UDF。请阅读有关我们未在此处使用SET STATISTICS TIME的原因的评论。

/******************************************************************************
 Test George's code.
 Since Scalar Functions don't work well with SET STATISTICS TIME ON, we measure
 duration a different way.  We'll also throw away the result in a "Bit Bucket"
 variable because we're trying to measure the performance of the function 
 rather than how long it takes to display or store results.
******************************************************************************/
--===== Declare some obviously named variables
DECLARE @StartTime DATETIME,
        @BitBucket VARCHAR(36)
;
--===== Start the "Timer"
 SELECT @StartTime = GETDATE()
;
--===== Run the test on the function
 SELECT @BitBucket = [dbo].[CleanDuplicates](SomeString,'A')
   FROM #MyHead
;
--===== Display the duration in milliseconds
  PRINT DATEDIFF(ms,@StartTime,GETDATE())
;
--===== Run the test a total of 5 times
GO 5

以下是“fiver”运行的回报......

Beginning execution loop
15750
15516
15543
15480
15510
Batch execution completed 5 times.
(Average is 15,559 on my 10 year old, single 1.8Ghz CPU)

现在，我们将运行“iSF”版本......

/******************************************************************************
 Test Jeff's code.
 Even though this uses an "iSF" (Inline Scalar Function), we'll test exactly
 the same way that we tested George's code so we're comparing apples-to-apples.
 This includes throwing away the result in a "Bit Bucket" variable because 
 we're trying to measure the performance of the function rather than how long 
 it takes to display or store results.
******************************************************************************/
--===== Declare some obviously named variables
DECLARE @StartTime DATETIME,
        @BitBucket VARCHAR(36)
;
--===== Start the "Timer"
 SELECT @StartTime = GETDATE()
;
--===== Run the test on the function
 SELECT @BitBucket = cleaned.ITEM
   FROM #MyHead
  CROSS APPLY [dbo].[CleanDuplicatesJBM](SomeString,'A') cleaned
;
--===== Display the duration in milliseconds
  PRINT DATEDIFF(ms,@StartTime,GETDATE())
;
--===== Run the test a total of 5 times
GO 5

以下是该运行的结果。

Beginning execution loop
6856
6810
7020
7350
6996
Batch execution completed 5 times.
(Average is 7,006 {more than twice as fast} on my 10 year old, single 1.8Ghz CPU)

我的观点并非乔治的代码不好。一点也不。实际上，当没有“单一查询”解决方案时，我使用标量UDF。我还要声明并不是所有的“单一查询”解决方案都是最好的，而且还要说明乔治。

就UDF而言，不要停止寻找它们。 ; - ）

Answer 5

您的解决方案很好，但

仅限逗号
我讨厌基于循环的TSQL代码; - ）

所以我写了基于Marcin解决方案集的通用代码来替换每个声明的重复项：

DECLARE @Duplicate NVARCHAR(100)= '#$'
DECLARE @TestString NVARCHAR(MAX)= 'test_test__f##f2$$g'
DECLARE @Replacement NVARCHAR(MAX)= ''
DECLARE @OutputString NVARCHAR(MAX)= @teststring ;
WITH    numbers
          AS ( SELECT   ROW_NUMBER() OVER ( ORDER BY o.object_id, o2.object_id ) Number
               FROM     sys.objects o
                        CROSS JOIN sys.objects o2
             ),
        chars
          AS ( SELECT   SUBSTRING(@Duplicate, 1, 1) CHAR ,
                        CAST(1 AS INT) [LEVEL]
               UNION ALL
               SELECT   SUBSTRING(@Duplicate, numbers.Number, 1) CHAR ,
                        CAST(numbers.Number AS INT) [LEVEL]
               FROM     numbers
                        JOIN chars ON chars.Level + 1 = numbers.Number
               WHERE    LEN(SUBSTRING(@Duplicate, numbers.Number, 1)) > 0
             ),
        Replicated
          AS ( SELECT   REPLICATE(CHAR, numbers.number) Repl ,
                        numbers.Number
               FROM     chars
                        CROSS JOIN numbers
             )
    SELECT  @OutputString = REPLACE(@OutputString, Repl, @Replacement)
    FROM    replicated
    WHERE   number <= LEN(@TestString)

SELECT  @OutputString

您可以在Duplicate字符串和@Replacement中的每个替换字符串中声明每种类型的char。额外增益IMHO是我只在输入字符串的最大长度范围内搜索替换

Answer 6

你可以试试

SELECT REPLACE(LTRIM(REPLACE(',,,sam,,bob,', ',', ' ')),' ', ',')

删除重复的重复字符

6 个答案: