Question

我有两个数据框：

Df1：

原始df有1000多个名称

   Id    Name
    1     Paper
    2     Paper Bag
    3     Scissors
    4     Mat
    5     Cat
    6     Good Cat

第二个Df：

原始df有1000多个Item_Name

Item_ID   Item_Name
1         Paper Bag
2         wallpaper
3         paper
4         cat cage
5         good cat

预期输出：

Id Name         Item_ID
1  Paper         1,2,3
2  Paper Bag     1,2,3
3  Scissors      NA 
4  Mat           NA 
5  Cat           4,5
6  Good Cat           4,5

我的代码：

def matcher(x):
    res = df2.loc[df2['Item_Name'].str.contains(x, regex=False, case=False), 'Item_ID']
    return ','.join(res.astype(str))

df1['Item_ID'] = df1['Name'].apply(matcher)

当前挑战

str.contains在名称为Paper且Item_Name为Paper Bag的情况下有效，但在其他情况下则无效。因此，在我的示例中，它适用于df1的第1、3、4、5行，但不适用于第2和第6行。因此，它将不映射df1的行2与df2的第3行

询问

因此，如果您可以帮助我修改代码，使其也可以帮助进行匹配

Answer 1

您可以修改自定义的matcher函数并使用apply()：

def matcher(query):

    matches = [i['Item_ID'] for i in df2[['Item_ID','Name']].to_dict('records') if any(q in i['Name'].lower() for q in query.lower().split())]
    if matches:
        return ','.join(map(str, matches))
    else:
        return 'NA'

df1['Item_ID'] = df1['Name'].apply(matcher)

返回：

   Id       Name Item_ID
0   1      Paper   1,2,3
1   2  Paper Bag   1,2,3
2   3   Scissors      NA
3   4        Mat      NA
4   5        Cat     4,5
5   6   Good Cat     4,5

说明：

我们正在使用apply()将自定义matcher()函数应用于您的df1['Name']列的每一行值。在我们的matcher()函数中，我们将df2转换成字典，以Item_ID作为键，以Name作为值。然后，我们可以检查我们当前的行值query是否存在于any()的{{1}}值的Name中（通过df1转换为小写），以及因此，我们可以将lower()添加到要返回的列表中。

在Python中匹配2个数据框列的字符串

1 个答案: