Question

我有两个数据框。 df1包含“文本”列（假设新闻摘录），而df2包含“名称”列。我想应用一个函数，使其在df1中创建一个新的布尔列'C'，以指示df1'text'列是否包含特定的df2'name'列元素。两列都是对象。

df1

text
In contrast to other large markets in Asia, Vietnam saw a surge in installations as ...
... for water for industrial production and clean water consumption. .
Barry Kiely, CEO and co-founder of PrecisionBiotics Group, said,
UCAN Zipper USA (US), Keen Ching Industrial Co., Ltd. (Taiwan), Kao .
De plus, la croissance de l'industrie du vÃªtement crÃ©e des perspectives 
Workers depart the Samsung Electronics Vietnam Co. ... But trade experts said Vietnam

df2

Name
Keen Ching Industrial Co., Ltd.
Adidas Ltd.
Samsung Electronics Vietnam Co.
Nike co.
PrecisionBiotics Group

这是我尝试过的方法，但它给我带来了一个错误：

df1['C'] = df1.apply(lambda x: df2['name'] in x.text, axis=1)

错误：

TypeError: 'in <string>' requires string as left operand, not Series

所需的df1 C列

C
False
False
True
True
False 
True

Answer 1

df2['name']是整个Series，而不是单个元素。

由于您要检查x.text是否包含来自df2['name']的特定元素，请使用该元素-例如df2['name'].iloc[0]。

示例：

df = pd.DataFrame({'text': ['abc', 'def', 'ghi', 'jkl'], 'name': ['a', 'b', 'c', 'e']})
df['C'] = df.apply(lambda x: df['name'].iloc[0] in x.text, axis=1)

会给出-

 name text      C
0    a  abc   True
1    b  def  False
2    c  ghi  False
3    e  jkl  False

或者，如果您要检查文本中是否显示df['name']中的任何元素-

df['C'] = df.apply(lambda x: np.array([z in x.text for z in df['name']]).any(), axis=1)

会给出-

  name text      C
0    a  abc   True
1    b  def   True
2    c  ghi  False
3    e  jkl  False

Answer 2

经过几次阅读，我将假设以下内容：

df1['text'
df2['name']包含您要在df1['text']的每一行中研究的关键字（每行一个）
df1['C']将为真

因此，对于df1的任何行，您都必须测试df2中的所有行（或至少直到匹配为止）。您可以使用：

df1['C'] = df1['text'].apply(lambda x: df2['name'].apply(lambda y: y in x).any())

使用提供的示例数据，它可以提供预期的结果：

                                                text      C
0  In contrast to other large markets in Asia, Vi...  False
1  ... for water for industrial production and cl...  False
2  Barry Kiely, CEO and co-founder of PrecisionBi...   True
3  UCAN Zipper USA (US), Keen Ching Industrial Co...   True
4  De plus, la croissance de l'industrie du vÃªte...  False
5  Workers depart the Samsung Electronics Vietnam...   True

检查df1列是否包含df2列字符串

2 个答案: