嗨,我有很多语料库,我解析它们来提取所有模式:
在第一种情况下,我进行了此正则表达式,但并未获得所有匹配项:
>>> p = re.compile("[A-Z]+[0-9]+")
>>> res = p.search("aze azeaz GR55 AP1 PM89")
>>> res
<re.Match object; span=(10, 14), match='GR55'>
第二个:
>>> s = re.compile("[A-Z]+[a-z]+\s[A-Z]+[a-z]+\s[A-Z]+[a-z]+")
>>> resu = s.search("this is a test string, Hello Little Monkey, How Are You ?")
>>> resu
<re.Match object; span=(23, 42), match='Hello Little Monkey'>
>>> resu.group()
'Hello Little Monkey'
这似乎可行,但是我想在解析整个“大”行时获得所有匹配项。
答案 0 :(得分:3)
尝试以下2个正则表达式:
(为安全起见,它们用空格/逗号边界括起来)
>>> import re
>>> teststr = "aze azeaz GR55 AP1 PM89"
>>> res = re.findall(r"(?<![^\s,])[A-Z]+[0-9]+(?![^\s,])", teststr)
>>> print(res)
['GR55', 'AP1', 'PM89']
>>>
(?<! [^\s,] )
[A-Z]+ [0-9]+
(?! [^\s,] )
和
>>> import re
>>> teststr = "this is a test string, ,Hello Little Monkey, How Are You ?"
>>> res = re.findall(r"(?<![^\s,])[A-Z]+[a-z]+(?:\s[A-Z]+[a-z]+){1,}(?![^\s,])", teststr)
>>> print(res)
['Hello Little Monkey', 'How Are You']
>>>
(?<! [^\s,] )
[A-Z]+ [a-z]+
(?: \s [A-Z]+ [a-z]+ ){1,}
(?! [^\s,] )
答案 1 :(得分:2)
This expression可能会帮助您这样做或设计一个。似乎您希望您的表达式包含至少一个[A-Z]和至少一个[0-9]:
(?=[A-Z])(?=.+[0-9])([A-Z0-9]+)
此图显示了表达式的工作方式,您可以在此link中进行更多测试:
此代码显示了表达式在Python中的工作方式:
# -*- coding: UTF-8 -*-
import re
string = "aze azeaz GR55 AP1 PM89"
expression = r'(?=[A-Z])(?=.+[0-9])([A-Z0-9]+)'
match = re.search(expression, string)
if match:
print("YAAAY! \"" + match.group(1) + "\" is a match ")
else:
print(' Sorry! No matches! Something is not right! Call 911 ')
YAAAY! "GR55" is a match
此JavaScript代码段通过简单的100万次for
循环来显示表达式的性能。
repeat = 1000000;
start = Date.now();
for (var i = repeat; i >= 0; i--) {
var string = 'aze azeaz GR55 AP1 PM89';
var regex = /(.*?)(?=[A-Z])(?=.+[0-9])([A-Z0-9]+)/g;
var match = string.replace(regex, "$2 ");
}
end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. ");