我有两个pandas数据帧。
df1 :
ACNo Product
1 12340 100% Hot Care
2 23867 Auction5
3 98372 Edition
4 09837 Diet Parameter
5 54332 Load
df2 :
ProdDetail AttrName
1 12345.567 Age Confirmation
2 Model1 Count\100% Hot Care Recipe
3 123445\Handle Improve
4 Diet Edition\Parameter Amount
我想在df2的ProdDetail列上从df1查找Product列,并在df1中添加具有相应值的AttrName列。字符串可以位于ProdDetails中的任何位置,基本上类似于excel中的通配符函数。如果字符串出现在df2的ProdDetail中,我想拉出相应的AttrName。结果df1数据框应如下所示
ACNo Product AttrName
1 12340 100% Hot Care Recipe
2 23867 Auction5 N/A
3 98372 Edition Amount
4 09837 Diet Parameter N/A
5 54332 Load N/A
有人可以帮我解决这个问题吗?我尝试了多种方法,但无法找到解决方案。我看到一个类似的帖子,但它在R中,在Python中找不到。以下是我尝试的方式之一
ip=df1['Product']
def lookup_prod(ip):
return df2[(df2['ProdDetail'].str.contains(ip, na=False))]['AttrName']
df1['AttrName'] = data.apply(lambda row: lookup_prod(row['ProdDetails']), axis=1)
df1 = pd.DataFrame({'ACNo': ['12340', '23867', '98372', '09837', '54332'],
'Product': ['100% Hot Care', 'Auction5', 'Edition', 'Diet Parameter', 'Load']})
df2 = pd.DataFrame({'ProdDetail': [12345.567, r'Model1 Count\100% Hot Care',
r'123445\Handle', r'Diet Edition\Parameter'],
'AttrName': ['Age Confirmation', 'Recipe' , 'Improve', 'Amount']})
答案 0 :(得分:1)
我认为str.contains
仍在这里工作
df1.Product.apply(lambda x : df2.AttrName[df2.ProdDetail.str.contains(x)].sum(),1)
Out[805]:
1 Recipe
2 False
3 Amount
4 False
5 False
Name: Product, dtype: object
答案 1 :(得分:0)
一种方法是将pd.Series.apply
与自定义函数和for
循环一起使用:
def lookup_prod(ip):
for row in df2.itertuples():
if ip in row[1]:
return row[2]
else:
return 'N/A'
df1['AttrName'] = df1['Product'].apply(lookup_prod)
print(df1)
# ACNo Product AttrName
# 1 12340 HotCare Recipe
# 2 23867 Auction N/A
# 3 98372 Edition Amount
# 4 9837 DietParameter N/A
# 5 54332 Load N/A
示例#2
此方法仍有效:
import pandas as pd
df1 = pd.DataFrame({'ACNo': ['12340', '23867', '98372', '09837', '54332'],
'Product': ['100% Hot Care', 'Auction5', 'Edition', 'Diet Parameter', 'Load']})
df2 = pd.DataFrame({'ProdDetail': [r'Sesonal Items\Limted Number', r'Model1 Count\100% Hot Care',
r'123445\Handle', r'Diet Edition\Parameter'],
'AttrName': ['Age Confirmation', 'Recipe' , 'Improve', 'Amount']})
def lookup_prod(ip):
for row in df2.itertuples():
if ip in str(row.ProdDetail):
return row.AttrName
else:
return 'N/A'
df1['AttrName'] = df1['Product'].apply(lookup_prod)
print(df1)
# ACNo Product AttrName
# 0 12340 100% Hot Care Recipe
# 1 23867 Auction5 N/A
# 2 98372 Edition Amount
# 3 09837 Diet Parameter N/A
# 4 54332 Load N/A