使用模式

时间:2018-03-22 03:57:10

标签: python regex

我试图围绕特定短语分割字符串,该短语可能包含或不包含特定单词。我很难为此找到合适的语法。

这是代码的当前版本:

import re
from pprint import pprint

text = """Here is a list: Bob talked to Caleb, and Caleb talked to Derek, and Derek talked to Eric, and Eric talked to Fred, and Fred talked to Greg, and Greg talked to Henry, and Henry talked to Isaac, and Isaac talked to Jesse, and Jesse talked to Ken."""

pprint(re.split(r"(a?n?d? ?\w+ talked to)",text))

在这个例子中,我想分开"鲍勃谈到"或者"和Caleb谈到",所以如果它存在或不存在,那么它应该被包括在内。

此代码产生(几乎正确):

['Here is a list:',
 ' Bob talked to',
 ' Caleb, ',
 'and Caleb talked to',
 ' Derek, ',
 'and Derek talked to',
 ' Eric, ',
 'and Eric talked to',
 ' Fred, ',
 'and Fred talked to',
 ' Greg, ',
 'and Greg talked to',
 ' Henry, ',
 'and Henry talked to',
 ' Isaac, ',
 'and Isaac talked to',
 ' Jesse, ',
 'and Jesse talked to',
 ' Ken.']

唯一的小错误是" Bob"前面有一个空格,因为有一个" ?"在正则表达式中。所以我不想要每个字母" a?n?d? ?&#34 ;.我宁愿拥有"(和)?"

不幸的是,这些是结果:

print(re.split(r"((and )?\w+ talked to)",text))

给我:

['Here is a list: ',
 'Bob talked to',
 None,
 ' Caleb, ',
 'and Caleb talked to',
 'and ',
 ' Derek, ',
 'and Derek talked to',
 'and ',
 ' Eric, ',
 'and Eric talked to',
 'and ',
 ' Fred, ',
 'and Fred talked to',
 'and ',
 ' Greg, ',
 'and Greg talked to',
 'and ',
 ' Henry, ',
 'and Henry talked to',
 'and ',
 ' Isaac, ',
 'and Isaac talked to',
 'and ',
 ' Jesse, ',
 'and Jesse talked to',
 'and ',
 ' Ken.']

这里,它正在寻找两个单位。我或许可以使用它,但如果它是一个单位会更好。

另一种选择可能是:

pprint(re.split(r"([and ]?\w+ talked to)",text))

给出:

['Here is a list:',
 ' Bob talked to',
 ' Caleb, and',
 ' Caleb talked to',
 ' Derek, and',
 ' Derek talked to',
 ' Eric, and',
 ' Eric talked to',
 ' Fred, and',
 ' Fred talked to',
 ' Greg, and',
 ' Greg talked to',
 ' Henry, and',
 ' Henry talked to',
 ' Isaac, and',
 ' Isaac talked to',
 ' Jesse, and',
 ' Jesse talked to',
 ' Ken.']

在这种情况下,"和"即使它可用,也不包括在内。那我该如何制作"和"可选作为一个单位?换句话说,"和"是进出的,但不是进出的。

1 个答案:

答案 0 :(得分:3)

我认为这就是你想要的:

((?:and )?\w+ talked to)

(?:and )是非捕获组,因此匹配但未被捕获。