如何从SQL中的字段中删除特殊字符和前导零

时间:2018-07-24 19:55:43

标签: sql sql-server tsql

我有一个类似于083_33:152#7 0100这样的字段,我想一次删除所有特殊字符,空格,前导和尾随零等。我怎样才能做到这一点?输出应该是这样的: 8333152701 这就是我所拥有的

select * from myTable where REPLACE(LTRIM(REPLACE(part_number, '0', ' ')), ' ', '0') =  '8333152701'

查询应返回以下内容:8333152701 谢谢

4 个答案:

答案 0 :(得分:2)

If performance is important then the fastest function for stripping non-numeric characters is DigitsOnlyEE (you can get the code by clicking the link). A complete solution that trims leading/trailing 0's and spaces would look like this:

DECLARE @string VARCHAR(100) = '083_33:152#7 0100';

SELECT de.digitsOnly
FROM (VALUES (RTRIM(LTRIM(@string)))) f(s)
CROSS APPLY (VALUES(    -- string, substring start, substring stop, string datalength:
  PATINDEX('%[^0]%',f.s),PATINDEX('%[^0]%',REVERSE(f.s)),LEN(f.s))) f2(ss,sstp,ds) 
CROSS APPLY (VALUES (SUBSTRING(f.s, f2.ss, f2.ds+1-f2.sstp-(f2.ss-1)))) trimmed(string)
CROSS APPLY dbo.digitsOnlyEE(trimmed.string) de;

What @shnugo posted can be fast provided that: 1. You turn it into an inline table valued function 2. Run it with a parallel execution plan

Note that, as-is the function returns 050 when the string is BDA505AD000FAC my understanding is that you should get: 505000 but I'm sure there's a quick fix for that. Anyhow, the function:

CREATE FUNCTION dbo.getonlynumbers(@v VARCHAR(100))
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH recCTE AS
(
    SELECT CASE WHEN ASCII(SUBSTRING(@v,1,1)) BETWEEN ASCII(0) AND ASCII(9) THEN SUBSTRING(@v,1,1) ELSE '' END AS Chr
          ,1 AS Pos
    UNION ALL
    SELECT CASE WHEN ASCII(SUBSTRING(@v,r.Pos+1,1)) BETWEEN ASCII(0) AND ASCII(9) THEN SUBSTRING(@v,r.Pos+1,1) ELSE '' END
          ,r.Pos+1
    FROM recCTE r
    WHERE r.Pos<=LEN(@v)
)
,GetOnlyNumbers(CleanedString) AS
(
    SELECT
    (
        SELECT Chr AS [*]
        FROM recCTE
        FOR XML PATH(''),TYPE
    ).value('.','varchar(100)')
)
SELECT REVERSE(B.CleanedFromRear) AS CleanedNumber
FROM GetOnlyNumbers
CROSS APPLY(SELECT SUBSTRING(CleanedString,PATINDEX('%[1-9]%',CleanedString),1000) AS CleanedFromFront) A
CROSS APPLY(SELECT SUBSTRING(REVERSE(CleanedFromFront),PATINDEX('%[1-9]%',REVERSE(CleanedFromFront)),1000) AS CleanedFromRear) B

Now for a performance test. First the sample data:

IF OBJECT_ID('tempdb..#strings') IS NOT NULL DROP TABLE #strings;

DECLARE @default VARCHAR(100) = '083_33:152#7 0100';
SELECT TOP (10000)
  string = 
    ISNULL(CAST(
      REPLICATE('  ', ABS(CHECKSUM(NEWID())%2))+
      REPLICATE('0',  ABS(CHECKSUM(NEWID())%4))+
      REPLACE(REPLACE(LEFT(NEWID(),12),'-','000'),'9', f.rnd)+
      REPLICATE('0',  ABS(CHECKSUM(NEWID())%4)) AS VARCHAR(100)),@default)
INTO #strings 
FROM sys.all_columns, sys.all_columns b
CROSS JOIN
(
  SELECT TOP (ABS(CHECKSUM(NEWID())%5)) f.C+''
  FROM
  (
    SELECT TOP (31) 
          ROW_NUMBER() OVER (ORDER BY (SELECT NULL))^32,
    CHAR((ROW_NUMBER() OVER (ORDER BY (SELECT NULL)))^32)
    FROM sys.all_columns) f(N,C)
  ORDER BY NEWID()
  FOR XML PATH('')
) f(rnd);

... next for the performance test. The benefit of inline table valued functions is that they can run with a serial and parallel execution plan. For this test with a serial and parallel execution plan.

PRINT 'getonlynumbers - Serial'+CHAR(10)+REPLICATE('-',60)
GO
DECLARE @st DATETIME = GETDATE(), @x VARCHAR(100);

SELECT @x = f.cleanedNumber
FROM #strings s
CROSS APPLY dbo.getonlynumbers(s.string) f
OPTION (MAXDOP 1);

PRINT DATEDIFF(MS,@st,GETDATE());
GO 3

PRINT 'getonlynumbers - parallel'+CHAR(10)+REPLICATE('-',60)
GO
DECLARE @st DATETIME = GETDATE(), @x VARCHAR(100);

SELECT @x = f.cleanedNumber
FROM #strings s
CROSS APPLY dbo.getonlynumbers(s.string) f
OPTION (QUERYTRACEON 8649);

PRINT DATEDIFF(MS,@st,GETDATE());
GO 3

PRINT 'DigitsOnlyEE - Serial'+CHAR(10)+REPLICATE('-',60)
GO
DECLARE @st DATETIME = GETDATE(), @x VARCHAR(100);

SELECT @x = de.digitsOnly
FROM #strings s
CROSS APPLY (VALUES (RTRIM(LTRIM(s.string)))) f(s)
CROSS APPLY (VALUES(    -- string, substring start, substring stop, string datalength:
  PATINDEX('%[^0]%',f.s),PATINDEX('%[^0]%',REVERSE(f.s)),LEN(f.s))) f2(ss,sstp,ds) 
CROSS APPLY (VALUES (SUBSTRING(f.s, f2.ss, f2.ds+1-f2.sstp-(f2.ss-1)))) trimmed(string)
CROSS APPLY dbo.digitsOnlyEE(trimmed.string) de
OPTION (MAXDOP 1);

PRINT DATEDIFF(MS,@st,GETDATE());
GO 3

PRINT 'DigitsOnlyEE - parallel'+CHAR(10)+REPLICATE('-',60)
GO
DECLARE @st DATETIME = GETDATE(), @x VARCHAR(100);

SELECT @x = de.digitsOnly
FROM #strings s
CROSS APPLY (VALUES (RTRIM(LTRIM(s.string)))) f(s)
CROSS APPLY (VALUES(    -- string, substring start, substring stop, string datalength:
  PATINDEX('%[^0]%',f.s),PATINDEX('%[^0]%',REVERSE(f.s)),LEN(f.s))) f2(ss,sstp,ds) 
CROSS APPLY (VALUES (SUBSTRING(f.s, f2.ss, f2.ds+1-f2.sstp-(f2.ss-1)))) trimmed(string)
CROSS APPLY dbo.digitsOnlyEE(trimmed.string) de
OPTION (QUERYTRACEON 8649);

PRINT DATEDIFF(MS,@st,GETDATE());
GO 3

And the results:

getonlynumbers - Serial
------------------------------------------------------------
Beginning execution loop
2007
2037
2153
Batch execution completed 3 times.

getonlynumbers - parallel
------------------------------------------------------------
Beginning execution loop
513
466
510
Batch execution completed 3 times.

DigitsOnlyEE - Serial
------------------------------------------------------------
Beginning execution loop
250
266
233
Batch execution completed 3 times.

DigitsOnlyEE - parallel
------------------------------------------------------------
Beginning execution loop
63
64
70
Batch execution completed 3 times.

答案 1 :(得分:1)

不太喜欢答案(因为嵌套的替换),但这似乎是一种解决方案:

select * from myTable where replace(replace(replace(replace(trim('0 _:#' FROM part_number), '_',''),':',''),'#',''),' ','')='8333152701'

Trim只删除开始和结束字符,但是您似乎还希望删除中间的字符。对于每个字符,我都有一个嵌套替换。

答案 2 :(得分:1)

这将删除所有非数字字符:

DECLARE @v VARCHAR(100)='083_33:152#7 0100';

WITH recCTE AS
(
    SELECT CASE WHEN ASCII(SUBSTRING(@v,1,1)) BETWEEN ASCII(0) AND ASCII(9) THEN SUBSTRING(@v,1,1) ELSE '' END AS Chr
          ,1 AS Pos
    UNION ALL
    SELECT CASE WHEN ASCII(SUBSTRING(@v,r.Pos+1,1)) BETWEEN ASCII(0) AND ASCII(9) THEN SUBSTRING(@v,r.Pos+1,1) ELSE '' END
          ,r.Pos+1
    FROM recCTE r
    WHERE r.Pos<=LEN(@v)
)
SELECT
(
    SELECT Chr AS [*]
    FROM recCTE
    FOR XML PATH(''),TYPE).value('.','varchar(100)');

该解决方案使用递归CTE 沿字符串移动。检查每个单个字符是否为数字。结果使用FOR XML串联。

更新:完整的解决方案,清除开头和结尾的零:

编辑:删除了不必要的重复

删除前导零和尾随零的最简单方法是将它们替换为空格,并使用LTRIM(RTRIM())除去前导和尾随的空格。然后用零代替内部空白。

CREATE FUNCTION dbo.getonlynumbers(@v VARCHAR(8000))
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH recCTE AS
(
    SELECT CASE WHEN SUBSTRING(@v,1,1)='0' THEN ' '
                WHEN SUBSTRING(@v,1,1) BETWEEN '1' AND '9' THEN SUBSTRING(@v,1,1) 
                ELSE '' END AS Chr
          ,1 AS Pos
    UNION ALL
    SELECT CASE WHEN SUBSTRING(@v,r.Pos+1,1)='0' THEN ' '
                WHEN SUBSTRING(@v,r.Pos+1,1) BETWEEN '1' AND '9' THEN SUBSTRING(@v,r.Pos+1,1)
                ELSE '' END AS Chr
              ,r.Pos+1
    FROM recCTE r
    WHERE r.Pos<=LEN(@v)
)
,GetOnlyNumbers(CleanedString) AS
(
    SELECT REPLACE(LTRIM(RTRIM(
    (
        SELECT Chr AS [*]
        FROM recCTE
        FOR XML PATH(''),TYPE
    ).value('.','varchar(8000)'))),' ','0')
)
SELECT CleanedString 
FROM GetOnlyNumbers;

答案 3 :(得分:0)

感谢Alan提供的细节,这真的很棒。如果有人需要简化版本,我就发布此内容。

select * from myTable 
where replace(ltrim(rtrim(replace(RTRIM(LTRIM(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(isnull(part_number,''),'-',''),'-',''),'*',''),' ',''),'.',''),',',''),'/',''),'\',''),'#',''),':',''),'''',''),'(',''),')',''))), '0', ' '))), ' ', '0') = '8333152701'