七个字符(至少一个字母和一个数字)的T-SQL PATINDEX模式

时间:2018-09-06 19:35:44

标签: sql-server tsql patindex

我想在任意长度的文本中标识出七个字符的大块:

  • 以字母开头
  • 至少包括一个数字(任何地方)
  • 所有字母均为大写

我如何用PATINDEX()表示这种类型的模式? PATINDEX('%[A-Z]%',text)满足第一个要求,但不满足其他要求。如何设置此变量,以便可以以任何方式(在第一个字符之后)混淆七个字符空间中的数字和字母?

我用它来打印块:SUBSTRING(MESSAGE_SUBJECT,PATINDEX('%[A-Z]%',MESSAGE_SUBJECT),7)

如果没有CLR,这似乎是不可能的。为了使其更简单,是否可以找到以字母开头并包含一个数字的七个字符分组?

3 个答案:

答案 0 :(得分:2)

根据我上面的评论...

declare @table table (a varchar(64))
insert into @table
values
('aaaaaA123A')
,('123A')
,('A123a')
,('A123')
,('A123ADD')
,('A1DD23A')
,('aAAA1DD23A')
,('aAAAAAAA')
,('hello there AA11BB2')


select a, 1 
from @table
where 
patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS) > 0
and substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7) collate Latin1_General_CS_AS = upper(substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7))
and patindex('%[0-9]%',substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7)) > 0

或者您可以用CASE

对其进行标记
select
    a
    ,MeetsPattern = case 
                        when patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS) > 0
                        and substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7) collate Latin1_General_CS_AS = upper(substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7))
                        and patindex('%[0-9]%',substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7)) > 0
                        then 1
                        else 0
                    end
from @table

或将其提取

select
    a
    ,substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7)
from @table
where
patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS) > 0
and substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7) collate Latin1_General_CS_AS = upper(substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7))
and patindex('%[0-9]%',substring(a,patindex('%[A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',a collate Latin1_General_CS_AS),7)) > 0

答案 1 :(得分:0)

我不相信PATINDEX()会为您提供所需的东西。 PATINDEX()函数返回与您的字符串匹配的第一个匹配项的位置。我想您会更喜欢使用LIKE()函数。

答案 2 :(得分:0)

像这样的东西不需要CLR或正则表达式。像NGrams8K旨在解决的问题完全。首先是有关NGrams8K的速成课程。

DECLARE @string VARCHAR(100) = 'ABC123XYZ'

SELECT ng.position, ng.token 
FROM   dbo.NGrams8k(@string, 7) AS ng;

返回:

position  token
--------- -----------
1         ABC123X
2         BC123XY
3         C123XYZ

要识别以(1)开头的字母(又称为 substring ,或者在N-Grams的上下文中是 7-gram )一个字母,至少包含一个数字,并且不包含小写字母,您可以像这样使用NGrams8K:

DECLARE @string VARCHAR(100) = 'x96AE0E33CFD5';

SELECT       ng.position, ng.token
FROM         dbo.ngrams8k(@string,7)                       AS ng
CROSS APPLY (VALUES(ng.token COLLATE latin1_general_bin2)) AS token(cs)
WHERE        token.cs LIKE '[A-Z]%[0-9]%' 
AND          token.cs NOT LIKE '%[a-z]%'; 

返回:

position   token
---------- ---------------
4          AE0E33C
5          E0E33CF
7          E33CFD5

如您所见,我们提取了每个符合您要求的7个字符的子字符串。另外,这将更有效:

SELECT ng.position, ng.token
FROM   dbo.ngrams8k(@string,7) AS ng
WHERE (ASCII(LEFT(ng.token,1)) - 65) & 0x7FFF < 26
AND    PATINDEX('%[a-z]%',ng.token COLLATE latin1_general_bin2) = 0;

为更好地了解发生了什么,请考虑以下查询:

DECLARE @string VARCHAR(100) = 'x96AE0E33CFD5';

SELECT       ng.position, 
             ng.token, 
             isMatch = CASE WHEN token.cs LIKE '[A-Z]%[0-9]%' 
                             AND token.cs NOT LIKE '%[a-z]%' THEN 1 ELSE 0 END
FROM         dbo.ngrams8k(@string,7)                       AS ng
CROSS APPLY (VALUES(ng.token COLLATE latin1_general_bin2)) AS token(cs);

返回:

position   token      isMatch
---------- ---------- ---------
1          x96AE0E    0
2          96AE0E3    0
3          6AE0E33    0
4          AE0E33C    1
5          E0E33CF    1
6          0E33CFD    0
7          E33CFD5    1

下面是一个表的示例,您只想返回符合条件的行:

DECLARE @table TABLE (someId INT IDENTITY, string VARCHAR(100));
INSERT @table(string) VALUES ('!!!!AB1234567'),('c555'),('!!ABC1234ggg')

SELECT t.someId, t.string
FROM   @table AS t
WHERE EXISTS
(
  SELECT  1
  FROM    dbo.ngrams8k(t.string,7) AS ng
  WHERE  (ASCII(LEFT(ng.token,1)) - 65) & 0x7FFF < 26
  AND     PATINDEX('%[a-z]%',ng.token COLLATE latin1_general_bin2) = 0
);