查找包含除下划线和空格之外的特殊字符的行

时间:2014-11-26 06:18:12

标签: sql sql-server-2008 tsql

我有一个包含许多2.1M行的列的表。以下是与我的问题相关的列:

Column_name Type    Computed    Length  Prec    Scale   Nullable    TrimTrailingBlanks  FixedLenNullInSource    Collation
id          int     no          4       10      0       no          (n/a)               (n/a)                   NULL
val         varchar no          15                      yes         no                  yes                     SQL_Latin1_General_CP1_CI_AS

我想在{{1}列中返回包含A-Za-z0-9(空格)和_以外的字符的行}}。 样本数据:

val

预期产出:

INSERT INTO tabl
            (id, val)
VALUES      (1, 'Extemporè'),
            (2, 'Aâkash'),
            (3, 'Driver 12'),
            (4, 'asd'),
            (5, '10'),
            (6, 'My_Car'),
            (7, 'Johnson & Sons'),
            (8, 'Johan''s Service'), 
            (9, 'Indus Truck')

我发现了类似的问题here,但它也没有给出预期的结果:

id  val
--  -----------
1   Extemporè
2   Aâkash
7   Johnson & Sons
8   Johan's Service

给出结果:

SELECT *
FROM   tabl
WHERE  val LIKE '%[^A-Z0-9 _]%' 

2 个答案:

答案 0 :(得分:3)

我会在像Latin1_General_BIN like this这样的排序规则的帮助下这样做:

SELECT *
FROM   tabl
WHERE  val COLLATE Latin1_General_BIN LIKE '%[^A-Za-z0-9 _]%'

这种方式似乎更容易,因为BIN排序规则既区分大小写又区分重音,而且重音字符与非重音字符分开整理。后者意味着很容易以范围的形式指定非重音字母。 (但是区分大小写意味着您还必须明确指定两个案例的字母,如上所示。)

答案 1 :(得分:1)

更新答案:使用临时表是为了排除诸如“Driver”或“Indus Truck”之类的值;临时表还强制对诸如“Aâkash”之类的值进行排序规则更改 - 这是为了确保正确的值不符合连接中的排除条件。
注意:特殊字符,如'或&必须手动将包含在正确值中的值添加到列表中(如下所示)。

create table #tabl(id int, val varchar(15))

insert #tabl(id, val)
select i.id, cast(i.val as varchar(200)) Collate SQL_Latin1_General_CP1253_CI_AI as val
from tabl i
where i.val <> upper(i.val) Collate SQL_Latin1_General_CP1_CS_AS
    and i.val <> lower(i.val) Collate SQL_Latin1_General_CP1_CS_AS
    and i.val not like '%[0-9]%'
    and i.val not like '%[_]%'
    and i.val not like '%[]%'
    and i.val not like '%[''&]%' -- add special characters (like ' or &) that are permitted in this list; 
                            -- this is the only "manual" requirement for this solution to work.

select t.id, t.val
from tabl t
left join #tabl tt on t.val = tt.val
where tt.val is null
    and t.val <> upper(t.val) Collate SQL_Latin1_General_CP1_CS_AS
    and t.val <> lower(t.val) Collate SQL_Latin1_General_CP1_CS_AS
    and t.val not like '%[0-9]%'
    and t.val not like '%[_]%'
    and t.val not like '%[]%'