Question

我正在使用一串我要搜索的文字，只找到4个字母的单词。它有效，除了它还可以找到4个以上的字母单词。

import re
test ="hello, how are you doing tonight?"
total = len(re.findall(r'[a-zA-Z]{3}', text))
print (total)

它找到15，虽然我不确定它是如何发现的那么多。我以为我可能不得不使用\ b来选择单词的开头和结尾，但这对我来说似乎没什么用。

Answer 1

试试这个

re.findall(r'\b\w{4}\b',text)

正则表达式匹配：

\b，这是一个单词边界。它匹配单词的开头或结尾。

\w{4}匹配四个字符（a-z，A-Z，0-9或_）。

\b是另一个词边界。

**作为旁注，您的代码包含拼写错误，re.findall的第二个参数应该是您的字符串变量的名称，即test。此外，您的字符串不包含任何4个字母的单词，因此建议的代码将输出为0。

Answer 2

这是一种没有正则表达式的方法：

from string import punctuation

s = "hello, how are you doing tonight?"

[i for i in s.translate(str.maketrans('', '', punctuation)).split(' ') if len(i) > 4]

# ['hello', 'doing', 'tonight']

Answer 3

您可以使用re.findall找到所有字母，然后根据长度进行过滤：

import re
test ="hello, how are you doing tonight?"
final_words = list(filter(lambda x:len(x) == 4, re.findall('[a-zA-Z]+', test)))

Python - 正则表达式 - 如何只找到四个字母的单词？

3 个答案: