正则表达式查找以特定字母开头或结尾的单词

时间:2017-02-19 22:30:39

标签: python regex

编写一个名为getWords(sentence, letter)的函数,它接受一个句子和一个字母,并返回一个以这个字母开头或结尾的单词列表,但不管两个字母都是如此,无论字母大小写如何。

例如:

>>> s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
>>> getWords(s, "t")
['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']

我的尝试:

regex = (r'[\w]*'+letter+r'[\w]*')
return (re.findall(regex,sentence,re.I))

我的输出:

['The', 'TART', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'until', 'next']

5 个答案:

答案 0 :(得分:4)

\b检测到分词。详细模式允许多行正则表达式和注释。请注意,[^\W]\w相同,但除了某个字母外,要匹配\w,您需要[^\W{letter}]

import re

def getWords(s,t):
    pattern = r'''(?ix)           # ignore case, verbose mode
                  \b{letter}      # start with letter
                  \w*             # zero or more additional word characters
                  [^{letter}\W]\b # ends with a word character that isn't letter
                  |               #    OR
                  \b[^{letter}\W] # does not start with a non-word character or letter
                  \w*             # zero or more additional word characters
                  {letter}\b      # ends with letter
                  '''.format(letter=t)
    return re.findall(pattern,s)

s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
print(getWords(s,'t'))

输出:

['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']

答案 1 :(得分:1)

使用startswith()endswith()方法可以轻松完成此操作。

def getWords(s, letter):
    return ([word for word in mystring.split() if (word.lower().startswith('t') or 
                word.lower().endswith('t')) and not 
                    (word.lower().startswith('t') and word.lower().endswith('t'))])

mystring = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
print(getWords(mystring, 't'))

输出

['The', 'Tuesdays', 'Thursdays,', 'but', 'it', 'not', 'start', 'next']
  

更新(使用正则表达式)

import re
result1 = re.findall(r'\b[t]\w+|\w+[t]\b', mystring, re.I)
result2 = re.findall(r'\b[t]\w+[t]\b', mystring, re.I)
print([x for x in result1 if x not in result2])

<强>解释

正则表达式\b[t]\w+\w+[t]\b查找以字母t开头和结尾的单词,\b[t]\w+[t]\b找到以字母t开头和结尾的单词。

生成两个单词列表后,只需取两个列表的交集。

答案 2 :(得分:1)

为什么要使用正则表达式?只需检查第一个和最后一个字符。

def getWords(s, letter):
    words = s.split()
    return [a for a,b in ((word, set(word.lower()[::len(word)-1])) for word in words) if letter in b and len(b)==2]

答案 3 :(得分:1)

你想要正则表达式,然后使用:

regex = r'\b(#\w*[^#\W]|[^#\W]\w*#)\b'.replace('#', letter)

完成replace是为了避免重复的详细+letter+

所以代码看起来像这样:

import re

def getWords(sentence, letter):
    regex = r'\b(#\w*[^#\W]|[^#\W]\w*#)\b'.replace('#', letter)
    return re.findall(regex, sentence, re.I)

s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
result = getWords(s, "t")
print(result)

输出:

['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']

解释

我使用#作为实际字母的占位符,并且在实际使用之前将在正则表达式中替换。

  • \b:word break
  • \w*:0个或更多字母(或下划线)
  • [^#\W]:一封不是#(给定的字母)
  • 的字母
  • |:逻辑OR。左侧匹配以字母开头的单词,但不以其结尾,右侧与相反的情况相符。

答案 4 :(得分:0)

您可以尝试内置startswithendswith功能。

>>> string = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
>>> [i for i in string.split() if i.lower().startswith('t') or i.lower().endswith('t')]
['The', 'TART', 'Tuesdays', 'Thursdays,', 'but', 'it', 'not', 'start', 'next']