如果列表中有单词,如何在匹配后找到单词? 例如,如果此单词在列表中,我想在 match1 之后找到该单词:
r = ["word1", "word2", "word3"]
如果找到,则返回字(i)。如果没有,则返回未知。
玩具示例:
Text1 = "This is a match1 for example match1 random text match1 anotherword"
Text2 = "This is a match1 word1 example"
Text3 = "This is an example without word of interest"
如果此单词在列表中 r = [“word1”,“word2”,“word3”] ,我想在 match1 之后查看单词 预期结果: 对于 Text1 ,对于 Text2 word1 以及 Text3 ,我希望获得未知的未知
到目前为止,我已经设法提取“word1”,只要它是前两个出现的,但如果我们有 Text4 (下面),我可以'描述它因为我只会去直到我第二次看到比赛,并且继续使用if-else语句进一步深入我不认为它的路要走,因为 word1 甚至根本不存在。
Text4 = "example match1 example match1 example match1 word1"
def get_labels(text):
q = ["match1"] #Here the idea is to have several, but its the same logic
r = ["word1", "word2", "word3"]
labels = []
for i,item in enumerate(q):
label = text[text.find(q[i])+len(q[i]):].split()[0]
if label in r:
labels.append(label)
else:
texto_temp = text[text.find(q[i])+len(q[i]):]
label2 = texto_temp[texto_temp.find(q[i])+len(q[i]):].split()[0]
labels.append(label2)
return labels
任何想法都将受到赞赏。
答案 0 :(得分:1)
如果我理解正确的话。这应该有效:
def get_labels(text):
q = ['match1']
r = ['word1', 'word2', 'word3']
labels = []
terms = text.split()
for i, term in enumerate(terms[:-1]):
if term in q and terms[i+1] in r:
labels.append(terms[i+1])
return labels if labels else 'Unknown'
答案 1 :(得分:1)
使用可以使用regular expressions查找匹配项。
代码的
from __future__ import print_function
import re
def get_labels(text, match, words)
tmp = re.findall(r'(?<={})\s+({})'.format(match, '|'.join(words)), text)
return tmp if tmp else "Unknown"
Text1 = "This is a match1 for example match1 random text match1 anotherword"
Text2 = "This is a match1 word1 example"
Text3 = "This is an example without word of interest"
Text4 = "example match1 example match1 example match1 word1"
match = "match1"
words = ["word1", "word2", "word3"]
print(get_labels(Text1, match, words))
print(get_labels(Text2, match, words))
print(get_labels(Text3, match, words))
print(get_labels(Text4, match, words))
控制台输出
Unknown
['word1']
Unknown
['word1']
如果您有需要,请询问更多细节......
答案 2 :(得分:0)
您可以尝试Positive Lookbehind (?<=match1\s)
import re
pattern=r'(?<=match1\s)[a-zA-Z0-9]+'
Text1 = "This is a match1 for example match1 random text match1 anotherword"
Text2 = "This is a match1 word1 example"
Text3 = "This is an example without word of interest"
Text4 = "example match1 example match1 example match1 word1"
r = ["word1", "word2", "word3"]
def word_checker(list_):
data=re.findall(pattern,list_)
list_data=[i for i in data if i in r]
if list_data:
return list_data[0]
else:
return 'Unknown'
输出:
print(word_checker(Text1))
print(word_checker(Text2))
print(word_checker(Text3))
print(word_checker(Text4))
输出:
Unknown
word1
Unknown
word1