我有两个数据帧,
DF1,
Names
one two three
Sri is a good player
Ravi is a mentor
Kumar is a cricketer player
DF2,
values
sri
NaN
sri, is
kumar,cricketer player
我想在df1中获取包含df2
中所有项目的行我的预期输出是,
values Names
sri Sri is a good player
NaN
sri, is Sri is a good player
kumar,cricketer player Kumar is a cricketer player
我试过,df1["Names"].str.contains("|".join(df2["values"].values.tolist()))
我也试过了,
但是我无法实现我的预期输出(“,”)。请帮忙
答案 0 :(得分:2)
使用Numpy广播设置逻辑。
d1 = df1['Names'].fillna('').str.lower().str.split('[^a-z]+').apply(set).values
d2 = df2['values'].fillna('').str.lower().str.split('[^a-z]+').apply(set).values
i, j = np.where(d1 >= d2[:, None])
df2.assign(Names=pd.Series(df1['Names'].values[j], df2['values'].index[i]))
values Names
0 sri Sri is a good player
1 NaN NaN
2 sri, is Sri is a good player
3 kumar,cricketer player Kumar is a cricketer player
答案 1 :(得分:1)
尝试 -
import pandas as pd
df1 = pd.read_csv('sample.csv')
df2 = pd.read_csv('sample_2.csv')
df2['values']= df2['values'].str.lower()
df1['names']= df1['names'].str.lower()
df2["values"] = df2['values'].str.replace('[^\w\s]',' ')
df2['values']= df2['values'].replace('\s+', ' ', regex=True)
df1["names"] = df1['names'].str.replace('[^\w\s]',' ')
df1['names']= df1['names'].replace('\s+', ' ', regex=True)
df2['list_values'] = df2['values'].apply(lambda x: str(x).split())
df1['list_names'] = df1['names'].apply(lambda x: str(x).split())
list_names = df1['list_names'].tolist()
def check_names(x, list_names):
output = ''
for list_name in list_names:
if set(list_name) >= set(x):
output = ' '.join(list_name)
break
return output
df2['Names'] = df2['list_values'].apply(lambda x: check_names(x, list_names))
print(df2)
<强>输出强>
values Names
0 sri sri is a good player
1 NaN
2 sri is sri is a good player
3 kumar cricketer player kumar is a cricketer player
<强> Exaplanation 强>
这是一个模糊匹配问题。以下是我应用的步骤 -
df
check_names()
函数进行匹配以获得所需的输出