Question

我有两列都有字符串列表。基本上是一列df['products']，它们都在所有大写字母中。另一列是产品说明df['desc']。

我想检查df['products']中df['desc']中的所有项目是什么，并从中创建一个新列。

我尝试了以下代码：

df['uniq'] = df.apply(lambda x : [i for i in x['products'] if i.lower() in x['desc']])

我检查了其他类似的问题并构建了上面的代码，但它没有用。

数据看起来像这样：

Answer 1

当你不是绝对需要时，不要使用apply()。它很慢。

相反，以矢量化方式进行：

desc_upper = df.desc.str.upper()
matches = df.products.isin(desc_upper)
result = df.products[matches]

Answer 2

如果每行需要检查，您似乎需要添加axis=1：

df = pd.DataFrame({'products':[['A','B'],['D','C']],
                   'desc':[['a', 'c'],['c', 'e']]})

df['uniq'] = df.apply(lambda x: [i for i in x['products'] if i.lower() in x['desc']], axis=1)
print (df)
     desc products uniq
0  [a, c]   [A, B]  [A]
1  [c, e]   [D, C]  [C]

如何将两列与字符串列表进行比较并创建具有唯一项的新列？

2 个答案: