匹配2个pandas数据帧之间的数据,并在Python中提取另一列的匹配值

时间:2018-04-20 23:31:52

标签: python string pandas dataframe

我有两个pandas数据帧。

df1 :
     ACNo       Product
1   12340       100% Hot Care
2   23867       Auction5
3   98372       Edition
4   09837       Diet Parameter
5   54332       Load

df2 :
    ProdDetail                          AttrName
1   12345.567                           Age Confirmation
2   Model1 Count\100% Hot Care          Recipe
3   123445\Handle                       Improve
4   Diet Edition\Parameter              Amount

我想在df2的ProdDetail列上从df1查找Product列,并在df1中添加具有相应值的AttrName列。字符串可以位于ProdDetails中的任何位置,基本上类似于excel中的通配符函数。如果字符串出现在df2的ProdDetail中,我想拉出相应的AttrName。结果df1数据框应如下所示

        ACNo        Product             AttrName
1       12340       100% Hot Care       Recipe  
2       23867       Auction5            N/A
3       98372       Edition             Amount
4       09837       Diet Parameter      N/A
5       54332       Load                N/A

有人可以帮我解决这个问题吗?我尝试了多种方法,但无法找到解决方案。我看到一个类似的帖子,但它在R中,在Python中找不到。以下是我尝试的方式之一

ip=df1['Product']
def lookup_prod(ip):
      return df2[(df2['ProdDetail'].str.contains(ip, na=False))]['AttrName']
df1['AttrName'] = data.apply(lambda row: lookup_prod(row['ProdDetails']), axis=1)

df1 = pd.DataFrame({'ACNo': ['12340', '23867', '98372', '09837', '54332'],
                    'Product': ['100% Hot Care', 'Auction5', 'Edition', 'Diet Parameter', 'Load']})

df2 = pd.DataFrame({'ProdDetail': [12345.567, r'Model1 Count\100% Hot Care',
                                   r'123445\Handle',  r'Diet Edition\Parameter'],
                    'AttrName': ['Age Confirmation', 'Recipe' , 'Improve',  'Amount']})

2 个答案:

答案 0 :(得分:1)

我认为str.contains仍在这里工作

df1.Product.apply(lambda x : df2.AttrName[df2.ProdDetail.str.contains(x)].sum(),1)
Out[805]: 
1    Recipe
2     False
3    Amount
4     False
5     False
Name: Product, dtype: object

答案 1 :(得分:0)

一种方法是将pd.Series.apply与自定义函数和for循环一起使用:

def lookup_prod(ip):
    for row in df2.itertuples():
        if ip in row[1]:
            return row[2]
    else:
        return 'N/A'

df1['AttrName'] = df1['Product'].apply(lookup_prod)

print(df1)

#     ACNo        Product AttrName
# 1  12340        HotCare   Recipe
# 2  23867        Auction      N/A
# 3  98372        Edition   Amount
# 4   9837  DietParameter      N/A
# 5  54332           Load      N/A

示例#2

此方法仍有效:

import pandas as pd

df1 = pd.DataFrame({'ACNo': ['12340', '23867', '98372', '09837', '54332'],
                    'Product': ['100% Hot Care', 'Auction5', 'Edition', 'Diet Parameter', 'Load']})

df2 = pd.DataFrame({'ProdDetail': [r'Sesonal Items\Limted  Number', r'Model1 Count\100% Hot Care',
                                   r'123445\Handle',  r'Diet Edition\Parameter'],
                    'AttrName': ['Age Confirmation', 'Recipe' , 'Improve',  'Amount']})

def lookup_prod(ip):
    for row in df2.itertuples():
        if ip in str(row.ProdDetail):
            return row.AttrName
    else:
        return 'N/A'

df1['AttrName'] = df1['Product'].apply(lookup_prod)

print(df1)

#     ACNo         Product AttrName
# 0  12340   100% Hot Care   Recipe
# 1  23867        Auction5      N/A
# 2  98372         Edition   Amount
# 3  09837  Diet Parameter      N/A
# 4  54332            Load      N/A