Question

我有一个单词列表文件，名为Words.txt，其中包含数百个单词和一些字幕文件（.srt）。我想浏览所有字幕文件，并在单词列表文件中搜索所有单词。如果找到一个单词，我想将其颜色更改为绿色。这是代码：

import fileinput
import os
import re

wordsPath = 'C:/Users/John/Desktop/Subs/Words.txt'
subsPath = 'C:/Users/John/Desktop/Subs/Season1'
wordList = []

wordFile = open(wordsPath, 'r')
for line in wordFile:
    line = line.strip()
    wordList.append(line)

for word in wordList:
    for root, dirs, files in os.walk(subsPath, topdown=False):
        for fileName in files:
            if fileName.endswith(".srt"):
                with open(fileName, 'r') as file :
                    filedata = file.read()
                    filedata = filedata.replace(' '  +word+  ' ', ' ' + '<font color="Green">' +word+'</font>' + ' ')
                with open(fileName, 'w') as file:
                    file.write(filedata)

说“书”一词在列表中，并且在字幕文件之一中找到。只要在“这本书很棒”这样的句子中使用这个词，我的代码就可以正常工作。但是，当提到该单词时，例如“书”，“书”，并且在句子的开头或句子的结尾，则代码将失败。我该如何解决这个问题？

Answer 1

您正在使用str.replace，来自文档：

Return a copy of the string with all occurrences of substring old replaced by new

这里出现的情况意味着字符串old与之完全匹配，然后该函数将尝试替换由空格包围的单词，例如' book '与' BOOK '，' Book '不同和' book'。让我们看看一些不匹配的情况：

" book " == " BOOK "  # False
" book " == " book"  # False
" book " == " Book "  # False
" book " == " bOok " # False
" book " == "   book " # False

一种选择是使用这样的正则表达式：

import re

words = ["book", "rule"]
sentences = ["This book is amazing", "The not so good book", "OMG what a great BOOK", "One Book to rule them all",
             "Just book."]

patterns = [re.compile(r"\b({})\b".format(word), re.IGNORECASE | re.UNICODE) for word in words]
replacements = ['<font color="Green">' + word + '</font>' for word in words]

for sentence in sentences:

    result = sentence[:]
    for pattern, replacement in zip(patterns, replacements):
        result = pattern.sub(r'<font color="Green">\1</font>', result)
    print(result)

输出

This <font color="Green">book</font> is amazing
The not so good <font color="Green">book</font>
OMG what a great <font color="Green">BOOK</font>
One <font color="Green">Book</font> to <font color="Green">rule</font> them all
Just <font color="Green">book</font>.

在特殊情况下，文本替换无效。

1 个答案: