找到以辅音开头和结尾的单词

时间:2014-03-04 03:21:54

标签: python python-3.x

我试图找到以辅音开头和结尾的单词。以下是我的尝试,而不是我想要的。我真的被困住了,需要你的帮助/建议。

import re

a = "Still, the conflicting reports only further served to worsen tensions in the Ukraine crisis, which has grown drastically \
in the past few weeks to a new confrontation between Russia and the West reminiscent of low points in the Cold War." 

b = re.findall(" ([b, c, d, f, g, h, j, k, l, m, n, p, q, r, s, t, v, w, x, y, z, ',', '.'].+?[b, c, d, f, g, h, j, k, l, m, n, p, q, r, s, t, v, w, x, y, z, ',', '.']) ", a.lower())
print(b)

输出是:

['the conflicting', 'further', 'to worsen', 'the ukraine crisis,', 'has', 'drastically', 'the past', 'weeks', 'new', 'between', 'the west', 'low', 'the cold']

但输出不正确。我必须使用正则表达式。没有它,我想是太难了。

非常感谢!

4 个答案:

答案 0 :(得分:4)

以下是使用startswith()endswith()的非常明确的解决方案。为了实现您的目标,您必须自己删除特殊字符并将字符串转换为单词列表(在代码中命名为s):

vowels = ('a', 'e', 'i', 'o', 'u')
[w for w in s if not w.lower().startswith(vowels) and not w.lower().endswith(vowels)]

答案 1 :(得分:2)

试试这个:

vowels = ['a', 'e', 'i', 'o', 'u']
words = [w for w in a.split() if w[0] not in vowels and w[-1] not in vowels]

然而,这不会处理以.,

结尾的单词

编辑:如果你必须使用正则表达式找到模式:

ending_in_vowel = r'(\b\w+[AaEeIiOoUu]\b)?' #matches all words ending with a vowel
begin_in_vowel = r'(\b[AaEeIiOoUu]\w+\b)?' #matches all words beginning with a vowel

然后我们需要找到所有不以元音开头也不以元音结尾的单词

ignore = [b for b in re.findall(begin_in_vowel, a) if b]
ignore.extend([b for b in re.findall(ending_in_vowel, a) if b])

然后你的结果就是:

result = [word for word in a.split() if word not in ignore]

答案 2 :(得分:1)

首先,你应split() a,以便获得每个单词。然后检查第一个字母和最后一个字母是否在列表consonants中。如果是,您将append改为all,最后打印all的内容。

consonants = ['b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'v', 'w', 'x', 'y', 'z']

a = "Still, the conflicting reports only further served to worsen tensions in the Ukraine crisis, which has grown drastically \
in the past few weeks to a new confrontation between Russia and the West reminiscent of low points in the Cold War."

all = []

for word in a.split():
    if word[0] in consonants and word[len(word)-1] in consonants:
        all.append(word)

print all

答案 3 :(得分:1)

如果您要删除标点符号,则此正则表达式将起作用:

>>> re.findall(r'\b[bcdfghj-np-tv-z][a-z]*[bcdfghj-np-tv-z]\b', a.lower())
['still', 'conflicting', 'reports', 'further', 'served', 'worsen', 'tensions', 'crisis', 'which', 'has', 'grown', 'drastically', 'past', 'few', 'weeks', 'new', 'confrontation', 'between', 'west', 'reminiscent', 'low', 'points', 'cold', 'war']

然而,您原来的尝试看起来似乎是在试图保留逗号和句号,所以如果这是您的目标,您可以使用它:

>>> re.findall(r'\b[bcdfghj-np-tv-z][a-z]*[bcdfghj-np-tv-z][,.]?(?![a-z])', a.lower())
['still,', 'conflicting', 'reports', 'further', 'served', 'worsen', 'tensions', 'crisis,', 'which', 'has', 'grown', 'drastically', 'past', 'few', 'weeks', 'new', 'confrontation', 'between', 'west', 'reminiscent', 'low', 'points', 'cold', 'war.']

我不确定为什么我的第一个例子中的\b通常不会与尾随标点符号(文档说它会匹配)相匹配,但无论如何这些都有效。

如果你想考虑收缩,那么表达式就是这样:

r"\b[bcdfghj-np-tv-z][a-z']*[bcdfghj-np-tv-z][,.]?(?![a-z])"