我有一个单词列表文件,名为Words.txt,其中包含数百个单词和一些字幕文件(.srt)。我想浏览所有字幕文件,并在单词列表文件中搜索所有单词。如果找到一个单词,我想将其颜色更改为绿色。这是代码:
import fileinput
import os
import re
wordsPath = 'C:/Users/John/Desktop/Subs/Words.txt'
subsPath = 'C:/Users/John/Desktop/Subs/Season1'
wordList = []
wordFile = open(wordsPath, 'r')
for line in wordFile:
line = line.strip()
wordList.append(line)
for word in wordList:
for root, dirs, files in os.walk(subsPath, topdown=False):
for fileName in files:
if fileName.endswith(".srt"):
with open(fileName, 'r') as file :
filedata = file.read()
filedata = filedata.replace(' ' +word+ ' ', ' ' + '<font color="Green">' +word+'</font>' + ' ')
with open(fileName, 'w') as file:
file.write(filedata)
说“书”一词在列表中,并且在字幕文件之一中找到。只要在“这本书很棒”这样的句子中使用这个词,我的代码就可以正常工作。但是,当提到该单词时,例如“书”,“书”,并且在句子的开头或句子的结尾,则代码将失败。我该如何解决这个问题?
答案 0 :(得分:1)
您正在使用str.replace,来自文档:
Return a copy of the string with all occurrences of substring old replaced by new
这里出现的情况意味着字符串old与之完全匹配,然后该函数将尝试替换由空格包围的单词,例如' book '
与' BOOK '
,' Book '
不同和' book'
。让我们看看一些不匹配的情况:
" book " == " BOOK " # False
" book " == " book" # False
" book " == " Book " # False
" book " == " bOok " # False
" book " == " book " # False
一种选择是使用这样的正则表达式:
import re
words = ["book", "rule"]
sentences = ["This book is amazing", "The not so good book", "OMG what a great BOOK", "One Book to rule them all",
"Just book."]
patterns = [re.compile(r"\b({})\b".format(word), re.IGNORECASE | re.UNICODE) for word in words]
replacements = ['<font color="Green">' + word + '</font>' for word in words]
for sentence in sentences:
result = sentence[:]
for pattern, replacement in zip(patterns, replacements):
result = pattern.sub(r'<font color="Green">\1</font>', result)
print(result)
输出
This <font color="Green">book</font> is amazing
The not so good <font color="Green">book</font>
OMG what a great <font color="Green">BOOK</font>
One <font color="Green">Book</font> to <font color="Green">rule</font> them all
Just <font color="Green">book</font>.