Question

如果列表中有单词，如何在匹配后找到单词？例如，如果此单词在列表中，我想在 match1 之后找到该单词：

r = ["word1", "word2", "word3"]

如果找到，则返回字（i）。如果没有，则返回未知。

玩具示例：

Text1 = "This is a match1 for example match1 random text match1 anotherword"
Text2 = "This is a match1 word1 example"
Text3 = "This is an example without word of interest"

如果此单词在列表中 r = [“word1”，“word2”，“word3”] ，我想在 match1 之后查看单词预期结果：对于 Text1 ，对于 Text2 word1 以及 Text3 ，我希望获得未知的未知

到目前为止，我已经设法提取“word1”，只要它是前两个出现的，但如果我们有 Text4 （下面），我可以'描述它因为我只会去直到我第二次看到比赛，并且继续使用if-else语句进一步深入我不认为它的路要走，因为 word1 甚至根本不存在。

Text4 = "example match1 example match1 example match1 word1"

def get_labels(text):
    q = ["match1"] #Here the idea is to have several, but its the same logic
    r = ["word1", "word2", "word3"]
    labels = []
    for i,item in enumerate(q):
        label = text[text.find(q[i])+len(q[i]):].split()[0]
        if label in r:
            labels.append(label)
        else:
            texto_temp = text[text.find(q[i])+len(q[i]):]
            label2 = texto_temp[texto_temp.find(q[i])+len(q[i]):].split()[0]
            labels.append(label2)
    return labels

任何想法都将受到赞赏。

Answer 1

如果我理解正确的话。这应该有效：

def get_labels(text):
    q = ['match1']
    r = ['word1', 'word2', 'word3']
    labels = []
    terms = text.split()
    for i, term in enumerate(terms[:-1]):
        if term in q and terms[i+1] in r:
            labels.append(terms[i+1])
    return labels if labels else 'Unknown'

Answer 2

使用可以使用regular expressions查找匹配项。

代码的

from __future__ import print_function
import re

def get_labels(text, match, words)
    tmp = re.findall(r'(?<={})\s+({})'.format(match, '|'.join(words)), text)

    return tmp if tmp else "Unknown"

Text1 = "This is a match1 for example match1 random text match1 anotherword"
Text2 = "This is a match1 word1 example"
Text3 = "This is an example without word of interest"
Text4 = "example match1 example match1 example match1 word1"

match = "match1"
words = ["word1", "word2", "word3"]

print(get_labels(Text1, match, words))
print(get_labels(Text2, match, words))
print(get_labels(Text3, match, words))
print(get_labels(Text4, match, words))

控制台输出

Unknown
['word1']
Unknown
['word1']

如果您有需要，请询问更多细节......

Answer 3

您可以尝试Positive Lookbehind (?<=match1\s)

import re
pattern=r'(?<=match1\s)[a-zA-Z0-9]+'

Text1 = "This is a match1 for example match1 random text match1 anotherword"
Text2 = "This is a match1 word1 example"
Text3 = "This is an example without word of interest"
Text4 = "example match1 example match1 example match1 word1"

r = ["word1", "word2", "word3"]

def word_checker(list_):
    data=re.findall(pattern,list_)
    list_data=[i for i in data if i in r]
    if list_data:
        return list_data[0]
    else:
        return 'Unknown'

输出：

print(word_checker(Text1))
print(word_checker(Text2))
print(word_checker(Text3))
print(word_checker(Text4))

输出：

Unknown
word1
Unknown
word1

如果在python中列出单词，请在匹配后查找单词

3 个答案: