我试图找到以辅音开头和结尾的单词。以下是我的尝试,而不是我想要的。我真的被困住了,需要你的帮助/建议。
import re
a = "Still, the conflicting reports only further served to worsen tensions in the Ukraine crisis, which has grown drastically \
in the past few weeks to a new confrontation between Russia and the West reminiscent of low points in the Cold War."
b = re.findall(" ([b, c, d, f, g, h, j, k, l, m, n, p, q, r, s, t, v, w, x, y, z, ',', '.'].+?[b, c, d, f, g, h, j, k, l, m, n, p, q, r, s, t, v, w, x, y, z, ',', '.']) ", a.lower())
print(b)
输出是:
['the conflicting', 'further', 'to worsen', 'the ukraine crisis,', 'has', 'drastically', 'the past', 'weeks', 'new', 'between', 'the west', 'low', 'the cold']
但输出不正确。我必须使用正则表达式。没有它,我想是太难了。
非常感谢!
答案 0 :(得分:4)
以下是使用startswith()
和endswith()
的非常明确的解决方案。为了实现您的目标,您必须自己删除特殊字符并将字符串转换为单词列表(在代码中命名为s
):
vowels = ('a', 'e', 'i', 'o', 'u')
[w for w in s if not w.lower().startswith(vowels) and not w.lower().endswith(vowels)]
答案 1 :(得分:2)
试试这个:
vowels = ['a', 'e', 'i', 'o', 'u']
words = [w for w in a.split() if w[0] not in vowels and w[-1] not in vowels]
然而,这不会处理以.
和,
编辑:如果你必须使用正则表达式找到模式:
ending_in_vowel = r'(\b\w+[AaEeIiOoUu]\b)?' #matches all words ending with a vowel
begin_in_vowel = r'(\b[AaEeIiOoUu]\w+\b)?' #matches all words beginning with a vowel
然后我们需要找到所有不以元音开头也不以元音结尾的单词
ignore = [b for b in re.findall(begin_in_vowel, a) if b]
ignore.extend([b for b in re.findall(ending_in_vowel, a) if b])
然后你的结果就是:
result = [word for word in a.split() if word not in ignore]
答案 2 :(得分:1)
首先,你应split()
a
,以便获得每个单词。然后检查第一个字母和最后一个字母是否在列表consonants
中。如果是,您将append
改为all
,最后打印all
的内容。
consonants = ['b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'v', 'w', 'x', 'y', 'z']
a = "Still, the conflicting reports only further served to worsen tensions in the Ukraine crisis, which has grown drastically \
in the past few weeks to a new confrontation between Russia and the West reminiscent of low points in the Cold War."
all = []
for word in a.split():
if word[0] in consonants and word[len(word)-1] in consonants:
all.append(word)
print all
答案 3 :(得分:1)
如果您要删除标点符号,则此正则表达式将起作用:
>>> re.findall(r'\b[bcdfghj-np-tv-z][a-z]*[bcdfghj-np-tv-z]\b', a.lower())
['still', 'conflicting', 'reports', 'further', 'served', 'worsen', 'tensions', 'crisis', 'which', 'has', 'grown', 'drastically', 'past', 'few', 'weeks', 'new', 'confrontation', 'between', 'west', 'reminiscent', 'low', 'points', 'cold', 'war']
然而,您原来的尝试看起来似乎是在试图保留逗号和句号,所以如果这是您的目标,您可以使用它:
>>> re.findall(r'\b[bcdfghj-np-tv-z][a-z]*[bcdfghj-np-tv-z][,.]?(?![a-z])', a.lower())
['still,', 'conflicting', 'reports', 'further', 'served', 'worsen', 'tensions', 'crisis,', 'which', 'has', 'grown', 'drastically', 'past', 'few', 'weeks', 'new', 'confrontation', 'between', 'west', 'reminiscent', 'low', 'points', 'cold', 'war.']
我不确定为什么我的第一个例子中的\b
通常不会与尾随标点符号(文档说它会匹配)相匹配,但无论如何这些都有效。
如果你想考虑收缩,那么表达式就是这样:
r"\b[bcdfghj-np-tv-z][a-z']*[bcdfghj-np-tv-z][,.]?(?![a-z])"