我有一个包含许多2.1M行的列的表。以下是与我的问题相关的列:
Column_name Type Computed Length Prec Scale Nullable TrimTrailingBlanks FixedLenNullInSource Collation
id int no 4 10 0 no (n/a) (n/a) NULL
val varchar no 15 yes no yes SQL_Latin1_General_CP1_CI_AS
我想在{{1}列中返回包含A-Z
,a-z
,0-9
,(空格)和
_
以外的字符的行}}。
样本数据:
val
预期产出:
INSERT INTO tabl
(id, val)
VALUES (1, 'Extemporè'),
(2, 'Aâkash'),
(3, 'Driver 12'),
(4, 'asd'),
(5, '10'),
(6, 'My_Car'),
(7, 'Johnson & Sons'),
(8, 'Johan''s Service'),
(9, 'Indus Truck')
我发现了类似的问题here,但它也没有给出预期的结果:
id val
-- -----------
1 Extemporè
2 Aâkash
7 Johnson & Sons
8 Johan's Service
给出结果:
SELECT *
FROM tabl
WHERE val LIKE '%[^A-Z0-9 _]%'
答案 0 :(得分:3)
我会在像Latin1_General_BIN like this这样的排序规则的帮助下这样做:
SELECT *
FROM tabl
WHERE val COLLATE Latin1_General_BIN LIKE '%[^A-Za-z0-9 _]%'
这种方式似乎更容易,因为BIN排序规则既区分大小写又区分重音,而且重音字符与非重音字符分开整理。后者意味着很容易以范围的形式指定非重音字母。 (但是区分大小写意味着您还必须明确指定两个案例的字母,如上所示。)
答案 1 :(得分:1)
更新答案:使用临时表是为了排除诸如“Driver”或“Indus Truck”之类的值;临时表还强制对诸如“Aâkash”之类的值进行排序规则更改 - 这是为了确保正确的值不符合连接中的排除条件。
注意:特殊字符,如'或&必须手动将包含在正确值中的值添加到列表中(如下所示)。
create table #tabl(id int, val varchar(15))
insert #tabl(id, val)
select i.id, cast(i.val as varchar(200)) Collate SQL_Latin1_General_CP1253_CI_AI as val
from tabl i
where i.val <> upper(i.val) Collate SQL_Latin1_General_CP1_CS_AS
and i.val <> lower(i.val) Collate SQL_Latin1_General_CP1_CS_AS
and i.val not like '%[0-9]%'
and i.val not like '%[_]%'
and i.val not like '%[]%'
and i.val not like '%[''&]%' -- add special characters (like ' or &) that are permitted in this list;
-- this is the only "manual" requirement for this solution to work.
select t.id, t.val
from tabl t
left join #tabl tt on t.val = tt.val
where tt.val is null
and t.val <> upper(t.val) Collate SQL_Latin1_General_CP1_CS_AS
and t.val <> lower(t.val) Collate SQL_Latin1_General_CP1_CS_AS
and t.val not like '%[0-9]%'
and t.val not like '%[_]%'
and t.val not like '%[]%'