我试图在文本文件中查找长度为7个字母并包含字母a,b,c,e和r的单词。到目前为止,我有这个:
import re
file = open("dictionary.txt","r")
text = file.readlines()
file.close()
keyword = re.compile(r'\w{7}')
for line in text:
result = keyword.search (line)
if result:
print (result.group())
任何人都可以帮助我吗?
答案 0 :(得分:2)
您不仅需要匹配单词字符,还需要匹配单词 boundary :
keyword = re.compile(r'\b\w{7}\b')
\b
锚点匹配单词的开头或结尾,将单词限制为正好 7个字符。
如果您逐行遍历文件而不是一次性将其全部读入内存,效率会更高:
import re
keyword = re.compile(r'\b\w{7}\b')
with open("dictionary.txt","r") as dictionary:
for line in dictionary:
for result in keyword.findall(line):
print(result)
使用keyword.findall()
为我们提供了所有匹配的列表。
要检查匹配项中是否至少包含一个必需字符,我个人只会使用一组交集测试:
import re
keyword = re.compile(r'\b\w{7}\b')
required = set('abcer')
with open("dictionary.txt","r") as dictionary:
for line in dictionary:
results = [required.intersection(word) for word in keyword.findall(line)]
for result in results
print(result)
答案 1 :(得分:1)
\b(?=\w{0,6}?[abcer])\w{7}\b
这是你想要的正则表达式。它的工作原理是使用基本形式为一个正好七个字母(\b\w{7}\b
)的单词并添加一个前瞻 - 一个向前看的零宽度断言,并试图找到你需要的一个字母。细分:
\b A word boundary
(?= Look ahead and find...
\w A word character (A-Za-z0-9_)
{0,6} Repeated 0 to 6 times
? Lazily (not necessary, but marginally more efficient).
[abcer] Followed by one of a, b, c, e, or r
) Go back to where we were before (just after the word boundary
\w And match a word character
{7} Exactly seven times.
\b Then one more word Boundary.