假设我有一些文字,例如:
text = 'Ophelia is a character in William Shakespeare's drama Hamlet. She is a young noblewoman of Denmark, the daughter of Polonius, sister of Laertes, and potential wife of Prince Hamlet.'
和False值的并行列表
wantedWords = [False]*len(text.split())
以及一系列短语和单词,例如:
phrases = ['Ophelia', 'Hamlet', 'daughter of Polonius', 'Prince Hamlet']
我希望对于在文本中找到的词组数组的每个实例,将wantWords设置为True。
因此WantedWords列表变为:
wanted Words = [True, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, True, True, True, False, False, False, False, False, False, False, True, True]
答案 0 :(得分:2)
这可能有帮助。
text = "Ophelia is a character in William Shakespeare's drama Hamlet. She is a young noblewoman of Denmark, the daughter of Polonius, sister of Laertes, and potential wife of Prince Hamlet."
wantedWords = []
phrases = ['Ophelia', 'Hamlet', 'daughter of Polonius', 'Prince Hamlet']
for i in sorted(phrases, key=lambda x: len(x), reverse=True): #Sorting the phrases list by len of elements.
if i in text:
text = text.replace(i, "*"*len(i.split())) #Replaceing found phase with *
for i in text.split():
if "*" in i:
for k in range(i.count("*")):
wantedWords.append(True)
else:
wantedWords.append(False)
print(wantedWords)