正则表达式用于匹配大写字母和数字

时间:2019-05-07 22:43:04

标签: python regex regex-lookarounds regex-group regex-greedy

嗨,我有很多语料库,我解析它们来提取所有模式:

  1. 喜欢如何提取所有模式,例如:AP70,ML71,GR55等。
  2. 以及所有以大写字母开头的单词序列的所有模式,例如:Hello Little Monkey,How Are You等。

在第一种情况下,我进行了此正则表达式,但并未获得所有匹配项:

>>> p = re.compile("[A-Z]+[0-9]+")
>>> res = p.search("aze azeaz GR55 AP1 PM89")
>>> res
<re.Match object; span=(10, 14), match='GR55'>

第二个:

>>> s = re.compile("[A-Z]+[a-z]+\s[A-Z]+[a-z]+\s[A-Z]+[a-z]+")
>>> resu = s.search("this is a test string, Hello Little Monkey, How Are You ?")
>>> resu
<re.Match object; span=(23, 42), match='Hello Little Monkey'>
>>> resu.group()
'Hello Little Monkey'

这似乎可行,但是我想在解析整个“大”行时获得所有匹配项。

2 个答案:

答案 0 :(得分:3)

尝试以下2个正则表达式:

(为安全起见,它们用空格/逗号边界括起来)


>>> import re
>>> teststr = "aze azeaz GR55 AP1 PM89"
>>> res = re.findall(r"(?<![^\s,])[A-Z]+[0-9]+(?![^\s,])", teststr)
>>> print(res)
['GR55', 'AP1', 'PM89']
>>>

可读正则表达式

 (?<! [^\s,] )
 [A-Z]+ [0-9]+ 
 (?! [^\s,] )

>>> import re
>>> teststr = "this is a test string, ,Hello Little Monkey, How Are You ?"
>>> res = re.findall(r"(?<![^\s,])[A-Z]+[a-z]+(?:\s[A-Z]+[a-z]+){1,}(?![^\s,])", teststr)
>>> print(res)
['Hello Little Monkey', 'How Are You']
>>>

可读正则表达式

 (?<! [^\s,] )
 [A-Z]+ [a-z]+ 
 (?: \s [A-Z]+ [a-z]+ ){1,}
 (?! [^\s,] )

答案 1 :(得分:2)

This expression可能会帮助您这样做或设计一个。似乎您希望您的表达式包含至少一个[A-Z]和至少一个[0-9]:

(?=[A-Z])(?=.+[0-9])([A-Z0-9]+)

enter image description here

此图显示了表达式的工作方式,您可以在此link中进行更多测试:

enter image description here

示例代码:

此代码显示了表达式在Python中的工作方式:

# -*- coding: UTF-8 -*-
import re

string = "aze azeaz GR55 AP1 PM89"
expression = r'(?=[A-Z])(?=.+[0-9])([A-Z0-9]+)'
match = re.search(expression, string)
if match:
    print("YAAAY! \"" + match.group(1) + "\" is a match  ")
else: 
    print(' Sorry! No matches! Something is not right! Call 911 ')

示例输出

YAAAY! "GR55" is a match  

性能

此JavaScript代码段通过简单的100万次for循环来显示表达式的性能。

repeat = 1000000;
start = Date.now();

for (var i = repeat; i >= 0; i--) {
	var string = 'aze azeaz GR55 AP1 PM89';
	var regex = /(.*?)(?=[A-Z])(?=.+[0-9])([A-Z0-9]+)/g;
	var match = string.replace(regex, "$2 ");
}

end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match  ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test.  ");