Question

我正在尝试编写一个python正则表达式语句，它将按以下方式工作......可以遵循的元音（a，e，i，o，u，á，é，í，ó，ú）通过冒号（:），然后是另一个元音（a，e，i，o，u，á，é，í，ó，ú），后面跟着另一个冒号。因此元音之间的冒号是可选的，但如果它在那里它必须存在于输出中。这就是我尝试使用(:)?的原因。如果该模式匹配，则将删除最后一个冒号。具有急性重音的元音被认为是完全不同的元音。因此，a被视为与á不同的元音。以下是另一种表现形式。

V = a,e,i,o,u,á,é,í,ó,ú
V:V: will become V:V
VV: will become VV

请注意，在两种模式中，第二个元音后的冒号总是被删除。但如果两个元音之间存在冒号，它将出现在输出中。

以下是一些应该适用的模式以及它应该成为什么模式。

a:é: will become a:é // colon between the vowels is present in the output, colon after the two vowels is dropped from output
ia: will become ia // colon after the two vowels is dropped from output
ó:a: will become óa // colon between the vowels is present in the output, colon after the two vowels is dropped from output

以下是我一直在尝试的但它不起作用：

word = re.sub(ur"([a|e|i|o|u|á|é|í|ó|ú])(:)?([a|e|i|o|u|á|é|í|ó|ú]):", ur'\1\3', word)

Answer 1

您的示例模式与您的描述不一致。以下是一些示例模式和符合您描述的RegEx。

<强>代码：

import re
V = u'aeiouáéíóú'
RE = re.compile('([%s])(:?)([%s]):' % (V, V))

word = RE.sub(r'\1\2\3', word)

测试代码：

data = (
    (u'a:é:', u'a:é'),
    (u'ia:', u'ia'),
    (u'ó:a', u'ó:a'),
)

for w1, w2 in data:
    print(w2, RE.sub(r'\1\2\3', w1))
    assert w2 == RE.sub(r'\1\2\3', w1)

<强>结果：

(u'a:\xe9', u'a:\xe9')
(u'ia', u'ia')
(u'\xf3:a', u'\xf3:a')

Python正则表达式是一个可选字符和一组字符

1 个答案: