Question

Table1有一个名为umsg的nvarchar列，其中包含unicode文本，还有一些时间也是英文。

我想找出umsg专栏中的英文文本。

select * 
from table1 
where 
    RDate >='01/01/2014' and RDate < '09/26/2017' 
    and umsg = convert(varchar(max), umsg)

我使用上面的查询在区域语言中工作正常，但有些时候失败了。假设col包含'refnoÃ©tÃ©'等文本我认为上面的消息是unicode，如果我使用上面的查询，它/ sql显示我是英文而不是unicode。如何处理这个。

Table :
Id  Date                      Umsg
1   2017-09-12 00:00:00.000   The livers detoxification processes.
2   2017-09-11 00:00:00.000   Purposely added 1 
3   2017-09-10 00:00:00.000   फेंगशुई के छोटे-छोटे टिप्स से आप जीवन की विषमताओं से                       स्वयं को बचा सकते
4   2017-09-17 00:00:00.000    तनाव एक लाइलाज बीमारी कतई नहीं है। कुछ लोग तनाव को                                     आसानी से झेल लेते ह
5   2017-09-17 00:00:00.000    ref no Ã©tÃ©

以上是我表格中的数据。但我想要数据/输出，如：

    Id      Date                      Umsg
    1   2017-09-12 00:00:00.000   The livers detoxification processes.
    2   2017-09-11 00:00:00.000   Purposely added 1

Answer 1

检查以下内容：

;WITH CTE
 AS (
 SELECT ID,
        DATE,
        umsg,
        CASE
            WHEN(CAST(umsg AS VARCHAR(MAX)) COLLATE SQL_Latin1_General_Cp1251_CS_AS) = umsg
            THEN 0
            ELSE 1
        END HasSpecialChars
 FROM <table_name>)
 SELECT ID,
        DATE,
        umsg
 FROM CTE
 WHERE Date >= '01/01/2014'
       AND Date < '09/26/2017'
       AND HasSpecialChars = 0;

期望输出：

ID  DATE                     umsg
1   2017-09-12 00:00:00.000  The livers detoxification processes.                                                                     
2   2017-09-11 00:00:00.000  Purposely added 1

希望，它会对你有帮助。

Answer 2

如果你在同一个字符串中有一些unicode和一些ascii字符，你没有回答你想要的内容，所以如果你只想找到“纯英语”或“混合”，我会给你1个想法和1个解决方案“行。

你需要一个自然数字表来执行此操作。如果你没有这样的表，你可以像这样生成它：

select top 1000000  row_number() over(order by getdate()) as n
into dbo.nums
from sys.messages m1 cross join sys.messages m2;

alter table dbo.nums alter column n int not null;

alter table dbo.nums add constraint PK_nums_n primary key(n);

现在您有一个自然数字表格，我们将把您的字符串分解为单个字符，以检查ascii(character) = unicode(character)：

declare @t table(col Nvarchar(200));
insert into @t values
(N'ref no Ã©tÃ©'), (N'The livers detoxification processes.'), (N'फेंगशुई के छोटे-छोटे टिप्स से आप जीवन की विषमताओं से')

select t.col, n, substring(t.col, n, 1) as nth_character,
       ascii(substring(t.col, n, 1)) as ascii,
       unicode(substring(t.col, n, 1)) as uni
from @t t join dbo.nums n
       on n.n <= len(t.col); -- this is to give you an idea how to see if it's unicode character or ascii

with cte as
(
select t.col, n, substring(t.col, n, 1) as nth_character,
       ascii(substring(t.col, n, 1)) as ascii,
       unicode(substring(t.col, n, 1)) as uni
from @t t join dbo.nums n
       on n.n <= len(t.col)
)
select col, 
       case
            when sum(case when ascii = uni then 1 else 0 end) = count(*) then 'English only'
            else 'Not only English'
       end as eng_or_not
from cte
group by col -- row level solution

代码的第一部分按字符显示字符串字符以及字符的ascii和unicode代码：它们与ascii字符相同。

第二部分只检查所有字符是否为ascii。

如何识别sql中的unicode文本？

2 个答案: