<ascii characters的数量> sql中非ascii字符的数量

时间:2017-01-04 08:54:53

标签: sql sql-server sql-server-2016

我使用以下代码来检测包含非ascii的描述。

现在我想将条件设置为 ascii字符数&gt;描述中的非ascii字符数,如何在SQL语句中编写它?

declare @str varchar(1024)
set @str = '|' + char(9) + '|' + char(10) + '|' + char(13)

-- Add all normal ASCII characters (32 -> 127)
declare @i int
set @i = 32
while @i <= 127
    begin
    -- Uses | to escape, could be any character
    set @str = @str + '|' + char(@i)
    set @i = @i + 1
    end

--select description, locale
select description, productlocale, locale
  FROM [DataExtraction].[dbo].[Feedback]
  where 
 and description like '%[^' + @str + ']%' escape '|'

2 个答案:

答案 0 :(得分:0)

可以使用函数:

create function dbo.fn_CountAscii(@str varchar(8000))
returns int
as
begin
declare @i int = 1
declare @n int = 0
while @i <= len(@str)
begin
    if ascii(substring(@str, @i, 1)) between 32 and 127
    begin
        set @n = @n + 1
    end
    set @i = @i + 1
end
return @n
end
go

declare @str varchar(8000)
set @str = '|' + char(9) + '|' + char(10) + '|' + char(13)

select @str, 
    convert(float, dbo.fn_CountAscii(@str)) / len(@str) as Ratio,
    case when convert(float, dbo.fn_CountAscii(@str)) / len(@str) > 0.5 then 1 else 0 end as MoreAsciiThanNonAscii

如果你的数据集很大,这可能会非常慢,但是很糟糕,看看

答案 1 :(得分:0)

我能想到的最有效的方法就是使用Tally(aka Numbers)表。如果您不熟悉这个概念,可以在这里阅读:http://www.sqlservercentral.com/articles/T-SQL/62867/

您可以在数据库中创建和填充Tally表(我在实用程序数据库中创建一个,因为它非常有用),或者您可以使用CTE动态构建一个,但这样可以实现更多功能。你必须生成的代码。对于这篇文章,我将使用CTE方法,因此您只需复制并粘贴解决方案即可试用:

DECLARE @pString VARCHAR(8000);

SELECT @pString = 'Çüé the quick brown fox which gave me 50¢.';

--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000... enough to cover VARCHAR(8000)
 WITH E1(N) AS (
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 TallyCTE(N) AS ( --=== This provides the "zero base" and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
                 SELECT 0 UNION ALL
                 SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                )
--===== Return all t.N+1 positions and calculate the length of the element for each starting position
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N),
       StartPosition = s.N,
       TheCharacter = SUBSTRING(@pString, N, 1)
FROM TallyCTE s
WHERE N <= LEN(@pString)
    AND ASCII(SUBSTRING(@pString, N, 1)) > 127

如果你要坚持Tally表,你的查询可以简化为:

    DECLARE @Parameter VARCHAR(8000);
SET @Parameter = 'Çüé,Element01,Element02,Element03,Element04,Element05,50¢';

SELECT  N,
        SUBSTRING(@Parameter, N, 1)
FROM    dbo.Tally
WHERE   N <= LEN(@Parameter)
        AND ASCII(SUBSTRING(@Parameter, N, 1)) > 127
ORDER BY N;

两者的结果是:

enter image description here