例如,如果给出'Happy'这个词,我只想要'H'和'y'。
如果给出'完成',我只想要'm','p','l','s','h','d。
我知道(\ w)\ 2会找到重复的字符,而(?i)
[b-df-hj-np-tv-z]会找到所有辅音,但我该如何组合它们呢?
答案 0 :(得分:2)
您可以使用
(?=[b-df-hj-np-tv-xz])(.)(?!\1)(?<!\1\1)
展开为
(?=[b-df-hj-np-tv-xz]) # Match only if the next character is a consonant
(.) # Match the consonant and capture it for subsequent usage
(?!\1) # Don't match if the next character if the same as the one we captured (avoid matching all but the last characters of a cluster)
(?<!\1\1) # Don't match if the penultimate character was the same as the one we captured (to avoid matching the last character of a cluster)
但遗憾的是re
中不允许最后一行,因为后卫必须有固定的长度。但是regex
模块¹支持它
In [1]: import regex
In [2]: s=r'(?=[b-df-hj-np-tv-xz])(.)(?!\1)(?<!\1\1)'
In [3]: regex.findall(s, 'happy')
Out[3]: ['h']
In [4]: regex.findall(s, 'accomplished')
Out[4]: ['m', 'p', 'l', 's', 'h', 'd']
¹“最终将取代Python目前的重新模块实现。”
答案 1 :(得分:0)
from re import findall
string = "Happy you!"
res = []
for c in findall('[^aeiou]', string):
if c not in res:
res.append(c)
过滤掉重复的内容,并根据您的要求使用“重复”内容。模块。
答案 2 :(得分:0)
这是一个可以使用的正则表达式:
([^aeiou])\1+|([^aeiou\s])
然后你可以抓住被捕获的组#2
<强>解释强>
[^aeiou] # matches a consonant
([^aeiou]) # puts a consonant in captured group #1
([^aeiou])\1+ # matches repetitions of group #1
| # regex alternation (OR)
([^aeiou\s]) # matches a consonant and grabs it in captured group #2
<强>代码:强>
>>> for m in re.finditer(r'([^aeiou])\1+|([^aeiou\s])', "accomplished"):
... print m.group(2)
...
None
m
p
l
s
h
d
答案 3 :(得分:0)
蛮力(超慢)解决方案:
import re
expr = '(?<!b)b(?!b)|(?<!c)c(?!c)|(?<!d)d(?!d)|(?<!f)f(?!f)|(?<!g)g(?!g)|(?<!h)h(?!h)|(?<!j)j(?!j)|(?<!k)k(?!k)|(?<!l)l(?!l)|(?<!m)m(?!m)|(?<!n)n(?!n)|(?<!p)p(?!p)|(?<!q)q(?!q)|(?<!r)r(?!r)|(?<!s)s(?!s)|(?<!t)t(?!t)|(?<!v)v(?!v)|(?<!w)w(?!w)|(?<!x)x(?!x)|(?<!y)y(?!y)|(?<!z)z(?!z)'
print re.findall(expr, 'happy')
print re.findall(expr, 'accomplished')
print re.findall(expr, 'happy accomplished')
print re.findall(expr, 'happy accccccompliiiiiiishedd')
# Readable form of expr
# (?<!b)b(?!b)|
# (?<!c)c(?!c)|
# (?<!d)d(?!d)|
# (?<!f)f(?!f)|
# (?<!g)g(?!g)|
# (?<!h)h(?!h)|
# (?<!j)j(?!j)|
# (?<!k)k(?!k)|
# (?<!l)l(?!l)|
# (?<!m)m(?!m)|
# (?<!n)n(?!n)|
# (?<!p)p(?!p)|
# (?<!q)q(?!q)|
# (?<!r)r(?!r)|
# (?<!s)s(?!s)|
# (?<!t)t(?!t)|
# (?<!v)v(?!v)|
# (?<!w)w(?!w)|
# (?<!x)x(?!x)|
# (?<!y)y(?!y)|
# (?<!z)z(?!z)
输出:
['h', 'y']
['m', 'p', 'l', 's', 'h', 'd']
['h', 'y', 'm', 'p', 'l', 's', 'h', 'd']
['h', 'y', 'm', 'p', 'l', 's', 'h']