我想在任意长度的文本中标识出七个字符的大块:
我如何用PATINDEX()
表示这种类型的模式? PATINDEX('%[A-Z]%',text)
满足第一个要求,但不满足其他要求。如何设置此变量,以便可以以任何方式(在第一个字符之后)混淆七个字符空间中的数字和字母?
我用它来打印块:SUBSTRING(MESSAGE_SUBJECT,PATINDEX('%[A-Z]%',MESSAGE_SUBJECT),7)
如果没有CLR,这似乎是不可能的。为了使其更简单,是否可以找到以字母开头并包含一个数字的七个字符分组?
答案 0 :(得分:2)
根据我上面的评论...
declare @table table (a varchar(64))
insert into @table
values
('aaaaaA123A')
,('123A')
,('A123a')
,('A123')
,('A123ADD')
,('A1DD23A')
,('aAAA1DD23A')
,('aAAAAAAA')
,('hello there AA11BB2')
select a, 1
from @table
where
patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS) > 0
and substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7) collate Latin1_General_CS_AS = upper(substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7))
and patindex('%[0-9]%',substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7)) > 0
或者您可以用CASE
select
a
,MeetsPattern = case
when patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS) > 0
and substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7) collate Latin1_General_CS_AS = upper(substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7))
and patindex('%[0-9]%',substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7)) > 0
then 1
else 0
end
from @table
或将其提取
select
a
,substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7)
from @table
where
patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS) > 0
and substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7) collate Latin1_General_CS_AS = upper(substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7))
and patindex('%[0-9]%',substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7)) > 0
答案 1 :(得分:0)
我不相信PATINDEX()会为您提供所需的东西。 PATINDEX()函数返回与您的字符串匹配的第一个匹配项的位置。我想您会更喜欢使用LIKE()函数。
答案 2 :(得分:0)
像这样的东西不需要CLR或正则表达式。像NGrams8K旨在解决的问题完全。首先是有关NGrams8K的速成课程。
此:
DECLARE @string VARCHAR(100) = 'ABC123XYZ'
SELECT ng.position, ng.token
FROM dbo.NGrams8k(@string, 7) AS ng;
返回:
position token
--------- -----------
1 ABC123X
2 BC123XY
3 C123XYZ
要识别以(1)开头的字母(又称为 substring ,或者在N-Grams的上下文中是 7-gram )一个字母,至少包含一个数字,并且不包含小写字母,您可以像这样使用NGrams8K:
DECLARE @string VARCHAR(100) = 'x96AE0E33CFD5';
SELECT ng.position, ng.token
FROM dbo.ngrams8k(@string,7) AS ng
CROSS APPLY (VALUES(ng.token COLLATE latin1_general_bin2)) AS token(cs)
WHERE token.cs LIKE '[A-Z]%[0-9]%'
AND token.cs NOT LIKE '%[a-z]%';
返回:
position token
---------- ---------------
4 AE0E33C
5 E0E33CF
7 E33CFD5
如您所见,我们提取了每个符合您要求的7个字符的子字符串。另外,这将更有效:
SELECT ng.position, ng.token
FROM dbo.ngrams8k(@string,7) AS ng
WHERE (ASCII(LEFT(ng.token,1)) - 65) & 0x7FFF < 26
AND PATINDEX('%[a-z]%',ng.token COLLATE latin1_general_bin2) = 0;
为更好地了解发生了什么,请考虑以下查询:
DECLARE @string VARCHAR(100) = 'x96AE0E33CFD5';
SELECT ng.position,
ng.token,
isMatch = CASE WHEN token.cs LIKE '[A-Z]%[0-9]%'
AND token.cs NOT LIKE '%[a-z]%' THEN 1 ELSE 0 END
FROM dbo.ngrams8k(@string,7) AS ng
CROSS APPLY (VALUES(ng.token COLLATE latin1_general_bin2)) AS token(cs);
返回:
position token isMatch
---------- ---------- ---------
1 x96AE0E 0
2 96AE0E3 0
3 6AE0E33 0
4 AE0E33C 1
5 E0E33CF 1
6 0E33CFD 0
7 E33CFD5 1
下面是一个表的示例,您只想返回符合条件的行:
DECLARE @table TABLE (someId INT IDENTITY, string VARCHAR(100));
INSERT @table(string) VALUES ('!!!!AB1234567'),('c555'),('!!ABC1234ggg')
SELECT t.someId, t.string
FROM @table AS t
WHERE EXISTS
(
SELECT 1
FROM dbo.ngrams8k(t.string,7) AS ng
WHERE (ASCII(LEFT(ng.token,1)) - 65) & 0x7FFF < 26
AND PATINDEX('%[a-z]%',ng.token COLLATE latin1_general_bin2) = 0
);