Question

有人能告诉我为什么看起来我在这个python代码中使用这个正则表达式得不到正确的结果吗？例如，我会想，“约”这个词中的初始元音不应该消失。感谢。

>>> sentence = "But the third reason Americans should care about Europe is more important    even than the risk of a renewed financial crisis."
>>> regexp = r'^[AEIOUaeiou]+|[AEIOUaeiou]+$|[^AEIOUaeiou]'
>>> def compress(word):
...     pieces = re.findall(regexp, word)
...     return ''.join(pieces)
>>> compress(sentence)
'Bt th thrd rsn mrcns shld cr bt rp s mr mprtnt vn thn th rsk f  rnwd fnncl crss.'

Answer 1

^和$锚定到整个字符串的末尾，因此您不会锚定到每个单词的开头和结尾，而是锚定到整个句子的开头和结尾。当句子只是“约”这个词时，它就像你期望的那样工作。我想你想要锚定到单词边界（\ b）。

http://www.regular-expressions.info/wordboundaries.html

这可能会达到你想要的目的：

regexp = r'\b[AEIOUaeiou]+|[AEIOUaeiou]+\b|[^AEIOUaeiou]'

Answer 2

'^[AEIOUaeiou]+'只允许匹配字符串开头的一连串元音

'[AEIOUaeiou]+$'只允许匹配字符串末尾的一连串元音

'[^AEIOUaeiou]'只允许匹配不是元音的字符

如果是'[^AEIOUaeiou]+'，它将允许匹配任何连续的非元音字符

目前，使用正则表达式的模式，您在使用过的句子中一次只能捕获一个非元音字符。

您的评论解释了您想要做的事情没有必要使用正则表达式来做到这一点;我认为使用正则表达式解决问题更难，或者至少更复杂

这是否满足您的需求？：

def compress(word):
    if len(word)<3:
        yield word
    else:
        yield word[0]
        for c in word[1:-1]:
            if c not in 'AEIOUaeiou':
                yield c
        yield word[-1]


print ' '.join(''.join(compress(word)) for word in sentence.split())

在python中使用正则表达式的单词压缩函数

2 个答案: