首先,我有一个由项目列表组成的字符串,可以使用 枚举器 (逗号/'和')或 文章 (“ a” /“ an” /“ the”)。请注意,如果有枚举器,则可以省略该文章,反之亦然
例如,让我们看一下此输入:
a paper, leaf the clock and an angel
这必须分为:
a paper
leaf
the clock
an angel
第一个示例只有单个名称的项目,所以让我们看另一个示例:
a paper with some letters, a torn leaf and clock and an angel doll
这必须分为:
a paper with some letters
torn leaf
clock
an angel doll
我已经为此尝试过一些正则表达式,而我最近使用的是:
(?:\b(?P<article>the|an|a)\b)\s(?P<object>\b.+?\b(?=\b(?:the|an|a|$)\b))
当然,我不考虑','/'和'拆分,因为我无法弄清楚,
最后,正如您所看到的,我使用组从 article 识别/分离 对象 。如果那样的话那将是很棒的。您有什么建议...
答案 0 :(得分:0)
只需使用re.split()
import re
a = "a paper with some letters, a torn leaf and clock and an angel doll"
### put every separator you want to remove after a |
re.split(', |and |a ',a)
# result:
['', 'paper with some letters', '', 'torn leaf ', 'clock ', '', 'angel doll']
如果需要保留分隔符,请使用方括号:
[i for i in re.split('(, |and |a )',a) if i]
# result:
['a ', 'paper with some letters', ', ', 'a ', 'torn leaf ', 'and ', 'clock ', 'and ', 'an angel doll']
答案 1 :(得分:0)
以re.split()
的正则表达式中匹配的内容的降序来枚举所有小的案例:
import re
s = "a paper with some letters, a torn leaf and clock and an angel doll"
re.split(r'^an |^a |^the |, and a |, and an |, and the |, and |, and an |, an |, the |, a | and an | and | an | the', s)
# ['', 'paper with some letters', 'torn leaf', 'clock', 'angel doll']
其余的只是清理''
,依此类推。
要保留匹配的内容,请按照文档将正则表达式括在括号中:
re.split(r'(^an |^a |^the |, and a |, and an |, and the |, and |, and an |, an |, the |, a | and an | and | an | the )', s)
# ['', 'a ', 'paper with some letters', ', a ', 'torn leaf', ' and ', 'clock', ' and an ', 'angel doll']
答案 2 :(得分:0)
关于我要解决的特定任务,我遇到了另一个想法, 步骤如下:
"( and|,) (?!the|an|a)|^(?!the|an|a)" # replace with " the "
"( and|,) " # replace with " "
"(?P<article>the|an|a) (?P<object>.+?(?= (?:the|an|a)\b)|[^$]*)"
PS:如果有人知道最后一个正则表达式的替代方法,请随时发布! :)
答案 3 :(得分:0)
通过使用re.sub(),我们可以用新行替换特定的字符串。 在re.sub()中,您可以添加需要用换行替换的文章。
示例代码:
s = 'a paper with some letters, a torn leaf and clock and an angel doll'
print(re.sub(r'(and|,)\s', r"\0\n", s))
输出:
a paper with some letters
a torn leaf
clock
an angel doll