Question

假设我有一个数据文件：

# cat 1.txt
#$$!#@#VM - This is VM$^#^#$^$^
%#%$%^SAS - This is SAS&%^#$^$
!@#!@%^$^MD - This is MD!@$!@%$

现在我要过滤以VM和SAS（不包括MD）开头的单词

预期结果：

VM - This is VM
SAS - This is SAS

我正在使用此代码，但显示了所有行。

import re

f = open("1.txt", "r")

for line in f:
    p = re.match(r'.+?((SAS|VM)[-a-zA-Z0-9 ]+).+?', line)
    if p:
        print (p.groups()[0])

在正则表达式中，我可以使用（pattern1 | pattern2）来匹配pattern1或pattern2 但在重新匹配中，括号用于匹配模式。

如何在re.match（）函数中指定“两个匹配项”？

Answer 1

这是一种方法。

例如：

import re

with open(filename) as infile:
    for line in infile:
        line = re.sub(r"[^A-Za-z\-\s]", "", line.strip())
        if line.startswith(("VM", "SAS")):
            print(line)

输出：

VM - This is VM
SAS - This is SAS

Answer 2

像这样尝试：

with open('1.txt') as f:
    for line in f:
        extract = re.match('.+?((SAS|VM)[-a-zA-Z0-9 ]+).+?', line)
        if extract:
            print(extract.group(1))

Python re.match：如何进行特定的“ OR”匹配

2 个答案: