Question

我有一个如下所示的字符串

line="record of Students Name Codes:  AC1.123  XYZ12.67  the student is math major first hisory: XY12.34 good performer second history M12.78 N23.76 faculty Miss Cooper"

我想从该行中提取一些代码。我正在使用以下程序。我想忽略历史记录部分中的代码。

我可以知道如何忽略其中包含历史记录的部分中的代码

import re
regular_expression = re.compile(r'\b[A-Z]+\d{1,2}\.*\d{1,2}\w{0,2}\b', re.I)
matches = regular_expression.findall(line)
for match in matches:
    print (match)

预期产量

AC1.123
XYZ12.67

货币输出：

AC1.123
XYZ12.67
XY12.34
M12.78
N23.76

Answer 1

您可以匹配不需要的历史记录中的所有值，然后在组中捕获您想要的内容：

\bhistory:? [A-Z]+\d+\.\d+(?: [A-Z]+\d+\.\d+)*|([A-Z]+\d+\.\d+(?: [A-Z]+\d+)*)

说明

\bhistory:?字边界，匹配历史记录，可选的冒号和空格
[A-Z]+\d+\.\d+匹配a + z 1+次，1 +位数字，点文字和1+位数字
(?:非捕获组
- [A-Z]+\d+\.\d+重复匹配前面的模式并加上一个空格
)*关闭非捕获组并重复0次以上
|或
(捕获组
- [A-Z]+\d+\.\d+与第一个图案匹配
- (?: [A-Z]+\d+)*重复相同的模式，并在前面加上空格
)

Regex demo | Python demo

我认为hisory是一个错字，应该是history

例如：

import re
line = "record of Students Name Codes:  AC1.123  XYZ12.67  the student is math major first history: XY12.34 good performer second history M12.78 N23.76 faculty Miss Cooper"
regular_expression = re.compile(r'\bhistory:? [A-Z]+[0-9]+\.[0-9]+(?: [A-Z]+[0-9]+\.[0-9]+)*|([A-Z]+[0-9]+\.[0-9]+(?: [A-Z]+[0-9]+)*)', re.I)
matches = regular_expression.findall(line)
print(list(filter(None, matches)))

结果

['AC1.123'，'XYZ12.67']

Answer 2

我不确定您想要的规则是什么，但这可能有助于您设计an expression：

(AC|XYZ)([0-9]+.[0-9]+)

图

此图显示了这样的表达式如何工作：

示例测试

# -*- coding: UTF-8 -*-
import re

string = "record of Students Name Codes:  AC1.123  XYZ12.67  the student is math major first hisory: XY12.34 good performer second history M12.78 N23.76 faculty Miss Cooper"
expression = r'((AC|XYZ)([0-9]+.[0-9]+))'
match = re.search(expression, string)
if match:
    print("YAAAY! \"" + match.group(1) + "\" is a match  ")
else: 
    print(' Sorry! No matches! Something is not right! Call 911 ')

使用Python忽略字符串中特定部分下的数据

2 个答案:

图

示例测试