我使用以下代码来检测包含非ascii的描述。
现在我想将条件设置为 ascii字符数>描述中的非ascii字符数,如何在SQL语句中编写它?
declare @str varchar(1024)
set @str = '|' + char(9) + '|' + char(10) + '|' + char(13)
-- Add all normal ASCII characters (32 -> 127)
declare @i int
set @i = 32
while @i <= 127
begin
-- Uses | to escape, could be any character
set @str = @str + '|' + char(@i)
set @i = @i + 1
end
--select description, locale
select description, productlocale, locale
FROM [DataExtraction].[dbo].[Feedback]
where
and description like '%[^' + @str + ']%' escape '|'
答案 0 :(得分:0)
可以使用函数:
create function dbo.fn_CountAscii(@str varchar(8000))
returns int
as
begin
declare @i int = 1
declare @n int = 0
while @i <= len(@str)
begin
if ascii(substring(@str, @i, 1)) between 32 and 127
begin
set @n = @n + 1
end
set @i = @i + 1
end
return @n
end
go
declare @str varchar(8000)
set @str = '|' + char(9) + '|' + char(10) + '|' + char(13)
select @str,
convert(float, dbo.fn_CountAscii(@str)) / len(@str) as Ratio,
case when convert(float, dbo.fn_CountAscii(@str)) / len(@str) > 0.5 then 1 else 0 end as MoreAsciiThanNonAscii
如果你的数据集很大,这可能会非常慢,但是很糟糕,看看
答案 1 :(得分:0)
我能想到的最有效的方法就是使用Tally(aka Numbers)表。如果您不熟悉这个概念,可以在这里阅读:http://www.sqlservercentral.com/articles/T-SQL/62867/
您可以在数据库中创建和填充Tally表(我在实用程序数据库中创建一个,因为它非常有用),或者您可以使用CTE动态构建一个,但这样可以实现更多功能。你必须生成的代码。对于这篇文章,我将使用CTE方法,因此您只需复制并粘贴解决方案即可试用:
DECLARE @pString VARCHAR(8000);
SELECT @pString = 'Çüé the quick brown fox which gave me 50¢.';
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000... enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
TallyCTE(N) AS ( --=== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
--===== Return all t.N+1 positions and calculate the length of the element for each starting position
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N),
StartPosition = s.N,
TheCharacter = SUBSTRING(@pString, N, 1)
FROM TallyCTE s
WHERE N <= LEN(@pString)
AND ASCII(SUBSTRING(@pString, N, 1)) > 127
如果你要坚持Tally表,你的查询可以简化为:
DECLARE @Parameter VARCHAR(8000);
SET @Parameter = 'Çüé,Element01,Element02,Element03,Element04,Element05,50¢';
SELECT N,
SUBSTRING(@Parameter, N, 1)
FROM dbo.Tally
WHERE N <= LEN(@Parameter)
AND ASCII(SUBSTRING(@Parameter, N, 1)) > 127
ORDER BY N;
两者的结果是: