在多个字段SQL中替换多个特殊字符

时间:2015-09-24 11:51:59

标签: c# sql sql-server

我有一个C#应用程序,它生成一个SQL查询,该查询应该用于从SQL Server中用户选择的列中删除特殊字符。我目前的查询是:

UPDATE [TableA] 
SET [EpiNum] = REPLACE([EpiNum], SUBSTRING([EpiNum], PATINDEX('%[^a-zA-Z0-9 ]%', [EpiNum]), 1), ''), 
    [Name] = REPLACE([Name], SUBSTRING([Name], PATINDEX('%[^a-zA-Z0-9 ]%', [Name]), 1), ''), 
    [Acct] = REPLACE([Acct], SUBSTRING([Acct], PATINDEX('%[^a-zA-Z0-9 ]%', [Acct]), 1), '') 
WHERE PATINDEX('%[^a-zA-Z0-9 ]%', [EpiNum]) <> 0 OR 
      PATINDEX('%[^a-zA-Z0-9 ]%', [Name]) <> 0 OR 
      PATINDEX('%[^a-zA-Z0-9 ]%', [Acct]) <> 0;
GO

这可以删除第一个特殊字符,但如果字符串有多个特殊字符,它只删除第一个

  1. “工资和工资”变成“工资工资”好!
    1. “薪金与工资 - 其他”成为“工薪 - 其他”不好!
    2. 我的问题是:

      仍然能够通过C#执行该查询时,如何修改上述查询以删除多个特殊字符?

      感谢您的时间。

      编辑。显然我可以做类似

      的事情
      declare @input varchar(500), @Action char(1)
      set @Input = '80-82/5 O$%*#@)(J^#oh!@!n & '' Bacon St'
      set @Action = 'A'
      
          DECLARE @i int
          DECLARE @result varchar(500)
          SET @result = @input
      
          if @Action = 'A'
          BEGIN
              SET @i = patindex('%[^a-zA-Z0-9 ]%', @result)
              WHILE @i > 0
              BEGIN
                  SET @result = STUFF(@result, @i, 1, '')
                  SET @i = patindex('%[^a-zA-Z0-9 ]%', @result)
              END
          END
      
      print @Input
      print @Result
      

      但我看不出如何调整这样的查询来处理多个字段和C#。这里的任何帮助将不胜感激。

6 个答案:

答案 0 :(得分:2)

您可以使用Recursive CTE递归应用REPLACE功能:

;WITH StripSpecialChars AS (
   SELECT id, 0 AS lvl,
          [EpiNum] = REPLACE([EpiNum], SUBSTRING([EpiNum], x.i, 1), ''), 
          [Name] = REPLACE([Name], SUBSTRING([Name], y.i, 1), ''), 
          [Acct] = REPLACE([Acct], SUBSTRING([Acct], z.i, 1), '')   
   FROM TableA
   CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [EpiNum])) AS x(i)
   CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [Name])) AS y(i)
   CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [Acct])) AS z(i)
   WHERE x.i <> 0 OR y.i <> 0 OR z.i <> 0

   UNION ALL

   SELECT id, lvl = lvl + 1,                       
          [EpiNum] = REPLACE([EpiNum], SUBSTRING([EpiNum], x.i, 1), ''), 
          [Name] = REPLACE([Name], SUBSTRING([Name], y.i, 1), ''), 
          [Acct] = REPLACE([Acct], SUBSTRING([Acct], z.i, 1), '') 
   FROM StripSpecialChars 
   CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [EpiNum])) AS x(i)
   CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [Name])) AS y(i)
   CROSS APPLY (SELECT PATINDEX('%[^a-zA-Z0-9 ]%', [Acct])) AS z(i)
   WHERE x.i <> 0 OR y.i <> 0 OR z.i <> 0
)

只要没有更多特殊字符可以替换,CTE就会终止。

每个lvl具有最大id值的行是包含[EpiNum][Name][Acct]字段的精简值的行。因此,您可以使用以下代码在单个SQL语句中执行UPDATE

;WITH StripSpecialChars AS (
 ... above query here ...
)
UPDATE t1
SET t1.[EpiNum] = t2.[EpiNum],
    t1.[Name] = t2.[Name],
    t1.[Acct] = t2.[Acct]   
FROM TableA AS t1
INNER JOIN (SELECT id, [EpiNum], [Name], [Acct],
                   ROW_NUMBER() OVER (PARTITION BY id 
                                      ORDER BY lvl DESC) AS rn 
            From StripSpecialChars) AS t2
ON t1.id = t2.id AND t2.rn = 1

Demo here

修改

如果TableA中没有PK列,那么您可以将表包装在CTE中,使用ROW_NUMBER模拟PK,最后在{{1}上执行更新}}:

CTE

Demo here

答案 1 :(得分:1)

这可能看起来有点复杂,但我用以下方法解决了类似的挑战:

只需将其粘贴到空的查询窗口中即可适应您的需求......

--This function comes back with a running set of numbers - very handsome
CREATE FUNCTION [dbo].[RunningNumbers](@counter INT=1000000, @StartAt INT=0)
RETURNS TABLE
AS 
RETURN
    WITH E1(N) AS(SELECT 1 FROM(VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(N)), --10 ^ 1
    E2(N) AS(SELECT 1 FROM E1 a CROSS JOIN E1 b), -- 10 ^ 2 = 100 rows
    E4(N) AS(SELECT 1 FROM E2 a CROSS JOIN E2 b), -- 10 ^ 4 = 10,000 rows
    E8(N) AS(SELECT 1 FROM E4 a CROSS JOIN E4 b), -- 10 ^ 8 = 10,000,000 rows
    CteTally AS
    (
        SELECT TOP(ISNULL(@counter,1000000)) ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) -1 + ISNULL(@StartAt,0) As Nmbr
        FROM E8
    )
    SELECT * FROM CteTally;
GO

--This function breaks down a string into a one-char-table with one char in each row.
--You can decide for any ascii code what you want to do with this character.
--At the end the whole thing is concatenated again.
CREATE FUNCTION [dbo].[GetPrintableChars]
(
     @Txt VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
    SET @Txt=LTRIM(RTRIM(ISNULL(@Txt,'')));

    DECLARE @rslt VARCHAR(MAX);
    SET @rslt =
        (
            SELECT Repl.ASCII_Code
            FROM dbo.RunningNumbers(LEN(@Txt),1) AS pos
            --ASCII-Codes of all characters in your text
            OUTER APPLY(SELECT ASCII(SUBSTRING(@Txt,pos.Nmbr,1)) AS ASCII_Code) AS OneChar  
            --re-code 
            CROSS APPLY
            (
                SELECT CASE 
                    WHEN OneChar.ASCII_Code IN(9,10,13) THEN CHAR(OneChar.ASCII_Code) --line and page break
                    WHEN OneChar.ASCII_Code BETWEEN 32 AND 126 THEN CHAR(OneChar.ASCII_Code) --normal printable
                    WHEN OneChar.ASCII_Code IN(132,142,148,153,174,175) THEN CHAR(OneChar.ASCII_Code) --extended to keep
                    WHEN OneChar.ASCII_Code BETWEEN 128 AND 154 THEN CHAR(176) --extended to get rid of
                    ELSE ''
                END AS ASCII_Code
            ) AS Repl    
            FOR XML PATH(''),TYPE
        ).value('.','varchar(max)');
    RETURN @rslt;
END
GO

--One example to get rid of some characters.
SELECT dbo.GetPrintableChars('This is a Test for special characters: ÐðÑñ')
GO

--And clean up for testing
DROP FUNCTION dbo.GetPrintableChars;
GO
DROP FUNCTION dbo.RunningNumbers;

答案 2 :(得分:1)

尽管戈登·林诺夫(Gordon Linoff)在制定约束方面做得非常好。 如果要将循环代码重用于多个字段,可以将其放在函数中:

CREATE FUNCTION dbo.RemoveSpecialCharacters (
    @String NVARCHAR(max)
)
RETURNS NVARCHAR(max)
BEGIN
    DECLARE @i int

    SET @i = patindex('%[^a-zA-Z0-9 ]%', @String)
    WHILE @i > 0
    BEGIN
        SET @String = STUFF(@String, @i, 1, '')
        SET @i = patindex('%[^a-zA-Z0-9 ]%', @String)
    END
    RETURN @String
END

只需重复使用该功能:

UPDATE [TableA] 
SET [EpiNum] = dbo.RemoveSpecialCharacters([EpiNum]), 
    [Name] = dbo.RemoveSpecialCharacters([Name]), 
    [Acct] = dbo.RemoveSpecialCharacters([Acct])
WHERE PATINDEX('%[^a-zA-Z0-9 ]%', [EpiNum]) <> 0 OR 
      PATINDEX('%[^a-zA-Z0-9 ]%', [Name]) <> 0 OR 
      PATINDEX('%[^a-zA-Z0-9 ]%', [Acct]) <> 0;

测试性能!如果你想在c#中检查结果,只需在选择中使用该功能,如果正确则更新。

答案 3 :(得分:1)

创建此功能:

CREATE function f_removebadcharacters
(
  @string varchar(2000)
)
RETURNS varchar(2000)
as
BEGIN
  DECLARE @badcharacters varchar(100) = '%[^A-Z0-9 ]%'

  WHILE @string like @badcharacters
    SET @string = STUFF(@string, patindex(@badcharacters, @string), 1, '')

  RETURN @string
END

像这样调用函数:

SELECT dbo.f_removebadcharacters('Salaries & Wages - Other')

在您的更新中,请使用以下语法:

UPDATE [TableA] 
SET [EpiNum] = dbo.f_removebadcharacters([EpiNum])
WHERE [EpiNum] LIKE '%[^A-Z0-9 ]%'

这是一个有效的例子:

DECLARE @TableA table([EpiNum] varchar(2000))
INSERT @TableA 
  values('Salaries & Wages - Other'),
        ('80-82/5 O$%*#@)(J^#oh!@!n & '''' Bacon St')


UPDATE @TableA
SET [EpiNum] = dbo.f_removebadcharacters([EpiNum])
WHERE [EpiNum] LIKE '%[^A-Z0-9 ]%'

SELECT * FROM @TableA

结果:

EpiNum
Salaries  Wages  Other
80825 OJohn   Bacon St

答案 4 :(得分:0)

如果这是一次性的努力,我建议多次运行update,直到所有角色消失。这可能是实现这一目标的最快方式。

执行此操作后,将表修复为具有仅接受所需值的约束:

alter table table1
    add constraint chk_EpiNum_Valie check (EpiNum NOT LIKE '%[^a-zA-Z0-9 ]%');

(并重复每个这样的专栏。)

然后,数据库将保证insertupdate上列的有效性。

答案 5 :(得分:0)

应用更新倍增时间并控制结果的方法

declare @l int;
select @l= COUNT(*) from sys.views  --just to set @@ROWCOUNT to 1

while @@ROWCOUNT >0
begin
  UPDATE [TableA] 
  SET [EpiNum] = REPLACE([EpiNum], SUBSTRING([EpiNum], PATINDEX('%[^a-zA-Z0-9 ]%', [EpiNum]), 1), ''), 
    [Name] = REPLACE([Name], SUBSTRING([Name], PATINDEX('%[^a-zA-Z0-9 ]%', [Name]), 1), ''), 
    [Acct] = REPLACE([Acct], SUBSTRING([Acct], PATINDEX('%[^a-zA-Z0-9 ]%', [Acct]), 1), '') 
  WHERE PATINDEX('%[^a-zA-Z0-9 ]%', [EpiNum]) <> 0 OR 
      PATINDEX('%[^a-zA-Z0-9 ]%', [Name]) <> 0 OR 
      PATINDEX('%[^a-zA-Z0-9 ]%', [Acct]) <> 0;
end