Question

我正在尝试在文本中查找罗马数字实例，后跟停顿符和空格，例如IV.。这些表明诗句的开始。但是，有些经文不是以罗马数字开头的，所以我在这些经文的开头插入了[NV]标签。我有一个可以找到数字的正则表达式和一个可以找到[NV]标签的正则表达式，但是我不能将它们组合在一个正则表达式中，而正则表达式将查找其中的一个或另一个。

我找到数字的正则表达式是：

numeralpat = re.compile(r'[IVX]{1,4}\. ')

我认为我可以将其与另一个正则表达式放到一个集合中，以找到数字或[NV]标签：

numeralpat = re.compile(r'[(\[NV\])([IVX]{1,4}\. )]')

这会导致相同类型的方括号之间出现问题，因此我尝试转义不同的字符以使其起作用。这些都不对我有用。可以使用正则表达式来完成吗？

编辑以添加示例文本：

文字：

I. this is some text with a verse numeral
II. this is some text with a verse numeral
III. this is some text with a verse numeral
[NV]this is text with no verse numeral
IV. this is some text with a verse numeral
V. this is some text with a verse numeral

预期的比赛：

'I. '
'II. '
'III. '
'[NV]'
'IV. '
'V. '

Answer 1

您可以像这样交替使用两个正则表达式，

(?:\[NV\]|[IVX]{1,4}\. )

这将匹配[NV]或I V X中的任何一个字符1至4次，后跟.和一个空格。

Demo

Answer 2

您可以指定备用查找，例如：r'(abc|def)'-查找'abc'或'def'-您也应该转括号以查找明确的\[NV\] 'N'或'V'：

import re

regex = r"(\[NV\]|[IVX]{1,4}\.)"

test_str = ("I. Some text\n"
    "some Text\n"
    "II. some text\n"
    "[NV] more text\n")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum= matchNum,
           start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum= groupNum,
               start = match.start(groupNum),
               end = match.end(groupNum), 
               group = match.group(groupNum)))

输出：

Match 1 was found at 0-2: I.
Group 1 found at 0-2: I.
Match 2 was found at 23-26: II.
Group 1 found at 23-26: II.
Match 3 was found at 37-41: [NV]
Group 1 found at 37-41: [NV]

请参见https://regex101.com/r/MpMxcP/1

它最多查找4次'[NV]'或'[IVX]'中的任何一个，后跟文字'.'

如何编写正则表达式，使整个正则表达式成为包含两个可能组的集合？

2 个答案: