Question

我正在尝试遍历多个文本文章，比较这些文章在2个完全不同的列表中是否具有关键字。如果文章在两个列表中都有关键字，则应返回“ true”。如果一篇文章只有一个列表中的关键字，则它应该为“ false”。

注意：我正在将一个较大的for循环分解为较小的位，以查看是否可以正常工作，这就是为什么我没有将其分为2个for循环，而该循环会检查每个列表并返回一个'每个变量都为1”，然后除掉小于2的任何东西...即使是很大的数据集，这仍然是可行的方法吗？

数据示例：

Data:
Text                                  result                
The co-worker ate all of the candy.    False
Bluejays love peanuts.                 False
Westies will eat avocado, even figs.   True

这是我的代码，但是我在for循环中苦苦挣扎。

def z(etext):
words = ['candy', 'chocolate', 'mints', 'figs', 'avocado']
words2 = ['eat', 'slurp', 'chew', 'digest']
for keywords in words and words2:
    return True

df['result'] = df['Keyterm'].apply(z)

此代码为我的数据帧的每一行返回“ true”，这是不正确的。每行中都有一个文本列表。

编辑：解决方案： def z(etext): words = ['candy', 'chocolate', 'mints', 'figs', 'avocado'] words2 = ['eat', 'slurp', 'chew', 'digest'] for keyword in words: index = etext.find(keyword) if index != -1: for anotherword in words2: index2 = etext.find(anotherword) if index2 != -1: return True

df['result'] = df['Text'].apply(z)

Answer 1

关于

西方人会吃鳄梨，甚至无花果。 [吃，鳄梨，无花果]

具有多个关键字，您是否要检查每个关键字。我的意思是当两个列表中都存在每个关键术语时返回True。

检查解决方案是否适合您？

Text = ["The co-worker ate all of the candy.",  "Bluejays love peanuts.","Westies will eat avocado, even figs."]
Keyterm = [["candy"], [], ["eat", "avocado", "figs"]]
data = pd.DataFrame({'Text': Text, 'Keyterm': Keyterm}) 

words = ['candy', 'chocolate', 'mints', 'figs', 'avocado']
words2 = ['eat', 'slurp', 'chew', 'digest', 'candy', 'figs']

def checkList(word, lists):
    if word in lists:
        return True
    else:
        return False

def z(etext):
    res = []
    for keyword in etext:
        ############# Using function checkList here ##############
        if checkList(keyword, words) and checkList(keyword, words2):
            res.append(True)
        else:
            res.append(False)
    return res
data['result'] = data['Keyterm'].apply(z)

在for循环中返回布尔值，评估多个列表

1 个答案: