Question

我仍然在使用python和pandas。我正在努力改进关键字评估。我的DF看起来像这样

Name  Description 
Dog   Dogs are in the house
Cat   Cats are in the shed
Cat   Categories of cats are concatenated

I am using a keyword list like this ['house', 'shed', 'in']

我的lambda函数看起来像这样

keyword_agg = lambda x: ' ,'.join x if x is not 'skip me' else None

我正在使用一个函数来识别和评分关键字匹配的每一行

def foo (df, words):
    col_list = []
    key_list= []
    for w in words:
        pattern = w
        df[w] = np.where(df.Description.str.contains(pattern), 1, 0)
        df[w +'keyword'] = np.where(df.Description.str.contains(pattern), w, 
                          'skip me')
        col_list.append(w)
        key_list.append(w + 'keyword')
    df['score'] = df[col_list].sum(axis=1)
    df['keywords'] = df[key_list].apply(keyword_agg, axis=1)

该函数使用work将关键字附加到列，然后根据匹配创建1或0。该函数还会创建一个带有'word + keyword'的列，并为每一行创建单词或'skip me'。

我希望申请可以像这样工作

df['keywords'] = df[key_list].apply(keyword_agg, axis=1)

返回

Keywords
in, house
in, shed
None

相反，我正在

Keywords
in, 'skip me' , house
in, 'skip me', shed
'skip me', 'skip me' , 'skip me'

有人可以帮助我解释为什么当我试图排除它们时，“跳过我”字符串会显示出来吗？

Answer 1

is运算符（以及is not）检查引用相等。

您应该使用等于运算符，这对于大多数原语来检查值相等：

lambda x: ' ,'.join(x) if x != 'skip me' else None

为什么这个条件lambda函数没有返回预期的结果？

1 个答案: