Question

首先，我开始尝试使用以下代码搜索文件中的一个单词：

import re

shakes = open("tt.txt", "r")

for line in shakes:
    if re.match("(.*)(H|h)appy(.*)", line):
        print line,

但如果我需要检查多个单词怎么办？我想也许像for循环这样的东西可以工作，每次搜索文件列表中的不同单词。

你认为这样方便吗？

Answer 1

只需加入word_list并以|作为分隔符。 (?i)不区分大小写的修饰符有助于进行不区分大小写的匹配。

for line in shakes:
    if re.search(r"(?i)"+'|'.join(word_lst), line):
        print line,

示例：

>>> f = ['hello','foo','bar'] >>> s = '''hello hai Foo Bar'''.splitlines() >>> for line in s: if re.search(r"(?i)"+'|'.join(f), line): print(line) hello Foo Bar

没有正则表达式：

>>> f = ['hello','foo','bar'] >>> s = '''hello hai Foo Bar'''.splitlines() >>> for line in s: if any(i.lower() in line.lower() for i in f): print(line) hello Foo Bar

Answer 2

我认为在这里使用正则表达式并不是pythonic，因为正则表达式有点隐含。因此，如果速度并不重要，我会使用循环：

def find_word(word_list, line):
    for word in word_list:
        if word in line:
            return line

with open('/path/to/file.txt') as f:
    result = [find_word(word_list, line.lower()) for line in f.readlines()]

Answer 3

另一个想法是使用set。

以下代码假定文件中的所有字词都用空格分隔，word_list是要查找的字词列表。

shakes = open("tt.txt", "r")
words = set(word_list)
for line in shakes:
    if words & set(line.split()):
        print line,

如果您想进行不区分大小写的搜索，可以将每个字符串转换为小写：

shakes = open("tt.txt", "r")
words = set(w.lower() for w in word_list)
for line in shakes:
    if words & set(line.lower().split()):
        print line,

python搜索文件列表中的单词

3 个答案: