我试图在突出某些单词和单词双字母的同时打印文本。如果我不打印像标点符号那样的其他标记,那么这将是相当直接的。
我有一个要突出显示的单词列表和另一个要突出显示的单词双字母列表。
突出显示单个单词相当容易,例如:
import re
import string
regex_pattern = re.compile("([%s \n])" % string.punctuation)
def highlighter(content, terms_to_hightlight):
tokens = regex_pattern.split(content)
for token in tokens:
if token.lower() in terms_to_hightlight:
print('\x1b[6;30;42m' + token + '\x1b[0m', end="")
else:
print(token, end="")
只突出显示顺序出现的单词会更复杂。我一直在玩迭代器,但是还没有能够想出任何不太复杂的东西。
答案 0 :(得分:0)
如果我正确理解了这个问题,一个解决方案是展望下一个单词标记并检查该双字母是否在列表中。
import re
import string
regex_pattern = re.compile("([%s \n])" % string.punctuation)
def find_next_word(tokens, idx):
nonword = string.punctuation + " \n"
for i in range(idx+1, len(tokens)):
if tokens[i] not in nonword:
return (tokens[i], i)
return (None, -1)
def highlighter(content, terms, bigrams):
tokens = regex_pattern.split(content)
idx = 0
while idx < len(tokens):
token = tokens[idx]
(next_word, nw_idx) = find_next_word(tokens, idx)
if token.lower() in terms:
print('*' + token + '*', end="")
idx += 1
elif next_word and (token.lower(), next_word.lower()) in bigrams:
concat = "".join(tokens[idx:nw_idx+1])
print('-' + concat + '-', end="")
idx = nw_idx + 1
else:
print(token, end="")
idx += 1
terms = ['man', 'the']
bigrams = [('once', 'upon'), ('i','was')]
text = 'Once upon a time, as I was walking to the city, I met a man. As I was tired, I did not look once... upon this man.'
highlighter(text, terms, bigrams)
调用时,这会给出:
-Once upon- a time, as -I was- walking to *the* city, I met a *man*. As -I was- tired, I did not look -once... upon- this *man*.
请注意:
yellow banana
和banana boat
,yellow banana boat
始终突出显示为-yellow banana- boat
。如果您想要其他行为,则应更新测试逻辑。terms
和双字母组的第一部分希望这有帮助。