我有一个如下所示的字符串
line="record of Students Name Codes: AC1.123 XYZ12.67 the student is math major first hisory: XY12.34 good performer second history M12.78 N23.76 faculty Miss Cooper"
我想从该行中提取一些代码。我正在使用以下程序。我想忽略历史记录部分中的代码。
我可以知道如何忽略其中包含历史记录的部分中的代码
import re
regular_expression = re.compile(r'\b[A-Z]+\d{1,2}\.*\d{1,2}\w{0,2}\b', re.I)
matches = regular_expression.findall(line)
for match in matches:
print (match)
预期产量
AC1.123
XYZ12.67
货币输出:
AC1.123
XYZ12.67
XY12.34
M12.78
N23.76
答案 0 :(得分:1)
您可以匹配不需要的历史记录中的所有值,然后在组中捕获您想要的内容:
\bhistory:? [A-Z]+\d+\.\d+(?: [A-Z]+\d+\.\d+)*|([A-Z]+\d+\.\d+(?: [A-Z]+\d+)*)
说明
\bhistory:?
字边界,匹配历史记录,可选的冒号和空格[A-Z]+\d+\.\d+
匹配a + z 1+次,1 +位数字,点文字和1+位数字(?:
非捕获组
[A-Z]+\d+\.\d+
重复匹配前面的模式并加上一个空格)*
关闭非捕获组并重复0次以上|
或(
捕获组
[A-Z]+\d+\.\d+
与第一个图案匹配(?: [A-Z]+\d+)*
重复相同的模式,并在前面加上空格)
我认为hisory
是一个错字,应该是history
例如:
import re
line = "record of Students Name Codes: AC1.123 XYZ12.67 the student is math major first history: XY12.34 good performer second history M12.78 N23.76 faculty Miss Cooper"
regular_expression = re.compile(r'\bhistory:? [A-Z]+[0-9]+\.[0-9]+(?: [A-Z]+[0-9]+\.[0-9]+)*|([A-Z]+[0-9]+\.[0-9]+(?: [A-Z]+[0-9]+)*)', re.I)
matches = regular_expression.findall(line)
print(list(filter(None, matches)))
结果
['AC1.123','XYZ12.67']
答案 1 :(得分:0)
我不确定您想要的规则是什么,但这可能有助于您设计an expression:
(AC|XYZ)([0-9]+.[0-9]+)
此图显示了这样的表达式如何工作:
# -*- coding: UTF-8 -*-
import re
string = "record of Students Name Codes: AC1.123 XYZ12.67 the student is math major first hisory: XY12.34 good performer second history M12.78 N23.76 faculty Miss Cooper"
expression = r'((AC|XYZ)([0-9]+.[0-9]+))'
match = re.search(expression, string)
if match:
print("YAAAY! \"" + match.group(1) + "\" is a match ")
else:
print(' Sorry! No matches! Something is not right! Call 911 ')