Python RegEx使用带有多个模式的re.sub

时间:2015-11-15 19:39:12

标签: python regex

我正在尝试使用Python RegEx re.sub在一个单词的倒数第二个元音[aeiou]之前删除冒号,如果倒数第二个元音(从结尾)前面有另一个元音。

所以冒号必须在第3个和第4个元音之间,从单词的结尾算起。

所以给出的第一个例子会像这样w4:32ny1h分解。

we:aanyoh > weaanyoh    # w4:32ny1h
hiru:atghigu > hiruatghigu
yo:ubeki > youbeki

以下是我正在尝试使用的RegEx声明,但我无法让它工作。

word = re.sub(ur"([aeiou]):([aeiou])(([^aeiou])*([aeiou])*([aeiou])([^aeiou])*([aeiou]))$", ur'\1\2\3\4', word)

6 个答案:

答案 0 :(得分:1)

请问您的括号太多(以及其他额外的东西)?:

word = re.sub(ur"([aeiou]):(([aeiou][^aeiou]*){3})$", ur'\1\2', word)

答案 1 :(得分:1)

不确定是否要完全忽略辅音;这个正则表达式会。其他类似于杰夫的。

import re

tests = [
    'we:aanyoh',
    'hiru:atghigu',
    'yo:ubeki',
    'yo:ubekiki',
    'yo:ubek'
]

for word in tests:
    s = re.sub(r'([^aeiou]*[aeiou][^aeiou]*):((?:[^aeiou]*[aeiou]){3}[^aeiou]*)$', r'\1\2', word)
    print '{} > {}'.format(word, s)

答案 2 :(得分:1)

你声明你的目标是一个单词而不是一行,所以首先设置锚点只处理单词:

\b[regex will go here]\b
^                      ^     assert a word boundary

接下来,冒号继续前后跟[aeiou],并在冒号后面的部分再添加两个[aeiou]。我假设案件独立?

(?i)(\b\w+[aeiou]):((?:[aeiou][^aeiou\s\W]*){3}\b)
                                   ^  match a character that is NOT vowel, space or not a 
                                         ^   \W=[^a-zA-Z0-9_]

Demo

(注意使用[^aeiou\W]是辅音字母,数字和_而不是其他字符Demo。)

Python演示:

import re

tests={
    'matches':[
        'we:aanyoh',
        'hiru:atghigu',
        'yo:ubeki'
        ],
    'no match':[
        'wz:ubeki',
        'we:a anyoh',
        'yo:ubek',
        'hiru:atghiguu'
    ]    
}

for k, v in tests.items():
    print k
    for e in v:
        s=re.sub(r'(?i)(\b\w+[aeiou]):((?:[aeiou][^aeiou\s\W]*){3}\b)', r'\1\2', e)
        print '\t{} > {}'.format(e, s)

打印:

matches
    we:aanyoh > weaanyoh
    hiru:atghigu > hiruatghigu
    yo:ubeki > youbeki
no match
    wz:ubeki > wz:ubeki
    we:a anyoh > we:a anyoh
    yo:ubek > yo:ubek
    hiru:atghiguu > hire:atghiguu

这只会处理单个冒号的单词。如果要匹配具有多个冒号但具有相同模式的单词,请将LH模式更改为包含冒号和非\b的锚的字符类。

示例:(?i)(^[\w:]+[aeiou]):((?:[aeiou][^aeiou\s\W]*){3}\b)

答案 3 :(得分:0)

它应该适用于此:

word = re.sub(ur"(?<=[aeiou]):(?=[aeiou]([^aeiou]*[aeiou]){2}[^aeiou]*$)", ur'', word)

请参阅此处的示例:https://regex101.com/r/kA8xH3/2

请注意,我只捕获冒号并用空字符串替换它,而不是捕获组并连接它们。

Tt检查结肠组合,然后进行预测以检查是否有2个额外的元音(可能是辅音)。它最后还允许额外的辅音,但确保$

中不再有元音

答案 4 :(得分:0)

这样做:

    word = re.sub(ur"([aeiou]):([aeiou])([^\Waeiou]*[aeiou][^\Waeiou]*[aeiou][^\Waeiou]*)$", ur'\1\2\3', word)

http://www.phpliveregex.com/p/dCa

答案 5 :(得分:-1)

综述(我用一个大写元组来表示替换应该在哪个单词中出现)。如果您希望我添加其他测试字符串,请告诉我。

import re

strings = [
    'wE:aanyoh',
    'hirU:atghigu',
    'yO:ubeki',

    'xE:aaa',
    'xx:aaa',
    'xa:aaaxA:aaa',
    'xa:aaaxA:aaaxx',
    'xa:aaaxA:aaxax',
    'a:aaaxA:aaxax',
    'e:aeixA:aexix',
]


pattern = r"""
    (
        .*
        [aeiou]
    )
    :
    (
        [aeiou]
        .*?
        [aeiou]
        .*?
        [aeiou]
    )
"""

template = "{:>15}: {}"
for string in strings:
    print(
        template.format('original', string)
    )

    print(template.format('Alexander:', 
        re.sub(ur"(?<=[aeiou]):(?=[aeiou]([^aeiou]*[aeiou]){2}[^aeiou]*$)", ur'', string, flags=re.I)
    ))

    print(template.format('lonut:', 
        re.sub(ur"([aeiou]):([aeiou])([^\Waeiou]*[aeiou][^\Waeiou]*[aeiou][^\Waeiou]*)$", ur'\1\2\3', string, flags=re.I)
    ))

    print(template.format('Tom Zych:', 
        re.sub(r'([^aeiou]*[aeiou][^aeiou]*):((?:[^aeiou]*[aeiou]){3}[^aeiou]*)$', r'\1\2', string, flags=re.I)
    ))

    print(template.format('Jeff Y:', 
        re.sub(ur"([aeiou]):(([aeiou][^aeiou]*){3})$", ur'\1\2', string, flags=re.I)
    ))

    print(template.format('7stud:', 
        re.sub(pattern, r'\1\2', string, count=1, flags=re.X|re.I)
    ))

    print("\n")
       original: wE:aanyoh
     Alexander:: wEaanyoh
         lonut:: wEaanyoh
      Tom Zych:: wEaanyoh
        Jeff Y:: wEaanyoh
         7stud:: wEaanyoh


       original: hirU:atghigu
     Alexander:: hirUatghigu
         lonut:: hirUatghigu
      Tom Zych:: hirUatghigu
        Jeff Y:: hirUatghigu
         7stud:: hirUatghigu


       original: yO:ubeki
     Alexander:: yOubeki
         lonut:: yOubeki
      Tom Zych:: yOubeki
        Jeff Y:: yOubeki
         7stud:: yOubeki


       original: xE:aaa
     Alexander:: xEaaa
         lonut:: xEaaa
      Tom Zych:: xEaaa
        Jeff Y:: xEaaa
         7stud:: xEaaa


       original: xx:aaa
     Alexander:: xx:aaa
         lonut:: xx:aaa
      Tom Zych:: xx:aaa
        Jeff Y:: xx:aaa
         7stud:: xx:aaa


       original: xa:aaaxA:aaa
     Alexander:: xa:aaaxAaaa
         lonut:: xa:aaaxAaaa
      Tom Zych:: xa:aaaxAaaa
        Jeff Y:: xa:aaaxAaaa
         7stud:: xa:aaaxAaaa


       original: xa:aaaxA:aaaxx
     Alexander:: xa:aaaxAaaaxx
         lonut:: xa:aaaxAaaaxx
      Tom Zych:: xa:aaaxAaaaxx
        Jeff Y:: xa:aaaxAaaaxx
         7stud:: xa:aaaxAaaaxx


       original: xa:aaaxA:aaxax
     Alexander:: xa:aaaxAaaxax
         lonut:: xa:aaaxAaaxax
      Tom Zych:: xa:aaaxAaaxax
        Jeff Y:: xa:aaaxAaaxax
         7stud:: xa:aaaxAaaxax


       original: a:aaaxA:aaxax
     Alexander:: a:aaaxAaaxax
         lonut:: a:aaaxAaaxax
      Tom Zych:: a:aaaxAaaxax
        Jeff Y:: a:aaaxAaaxax
         7stud:: a:aaaxAaaxax


       original: e:aeixA:aexix
     Alexander:: e:aeixAaexix
         lonut:: e:aeixAaexix
      Tom Zych:: e:aeixAaexix
        Jeff Y:: e:aeixAaexix
         7stud:: e:aeixAaexix