Question

我在df中对列表进行部分字符串搜索 - 我需要在df中创建一个新列，并从列表中得到匹配的值。我尝试了一些我在SO和其他地方找到的东西，但我似乎无法获得匹配的值。这是基本设置

df
index items
0     grape
1     apple
2     cat, dog 
3     dog, other

期望的输出：

df
index items      status count matching_values
0     grape      False   0     NaN 
1     apple      True    1     apple
2     cat, dog   True    2     cat, dog
3     dog, other True    1     dog

以下是匹配列表：

myList = ['apple,' 'orange', 'cat', 'dog']
matchList = '|'.join(myList)

这是有效的：

df['status'] = df['items'].str.lower().str.contains(matchList)  # works
df['count'] = df['items'].str.lower().str.count(matchList)  # works

无法让它发挥作用：

df['matching_values'] = ?? #this should place only the matching values from the list into this new column

我尝试了以下（以及其他变体） - 没有运气 - 这只是将整个列表放在单元格中：

if df['Status'].any() == True:
    df['List Match'] = matchList
else:
    df['List Match'] = "No Match"

并且，我想如果我可以从列表中获得匹配的位置，我可以这样匹配 - 这里没有运气 - 这输入'0'，索引位置 - 这是有道理的：

df.loc[df[items].str.lower().str.contains(matchList), 'List Match'] = matchList.index(matchList) # doesn't work

我还尝试仅检索原始items列中的匹配值 - 它还会复制所有单元格内容。

任何想法都会受到赞赏。

Answer 1

我假设列items中的所有字符串都用逗号分隔（如果没有，我们需要重新解决问题。更新：我重新解决了用逗号替换空格的解决方案;请参阅代码的.replace(" ", ",")部分。简而言之，下面将items中的字符串拆分为一个列表，如果出现则返回字符串在myList：

myList = ['apple,' 'orange', 'cat', 'dog']

df.loc[:,'matching_values'] = [", ".join([s if s in myList else "" for s in l]) for l in df['items'].str.lower().apply(lambda x: x.replace(" ", ",").split(","))]

这给你这样的东西：

    items           matching_values
0   grape   
1   apple juice     apple,
2   Cat,dog         cat, dog

如果你不喜欢像“grape”这样的行的空字符串，你可以填写这样的值：

import numpy as np

df = df.replace('', np.nan)

这会给你：

    items           matching_values
0   grape           NaN
1   apple juice     apple,
2   Cat,dog         cat, dog

Python pandas list / string search - 在数据框

1 个答案: