我有两个 dfs。 df1:
Summary
0 This is a basket of red apples.
1 We found a bushel of fruit. They are red.
2 There is a peck of pears that taste sweet.
3 We have a box of plums.
4 This is bag of green apples.
df2:
Fruits
0 plum
1 pear
2 apple
3 orange
我希望输出是:
df2:
Fruits Summary
0 plum We have a box of plums.
1 pear There is a peck of pears that taste sweet.
2 apple This is a basket of red apples, This is bag of green apples
3 orange
简单来说,如果在summary中找到了结果,则summary中的适当值应该返回,否则什么也没有或NaN。
编辑:如果找到多个实例,则应返回所有实例,并用逗号分隔。
答案 0 :(得分:1)
'Summary'
,将所有找到的 'Fruits'
添加到 list
,因为一个句子中可能有多个水果。lists
以分隔行df1
和 df2
'Fruits'
并将每个句子组合成逗号分隔的字符串。import pandas as pd
# sample dataframes
df1 = pd.DataFrame({'Summary': ['This is a basket of red apples. They are sour.', 'We found a bushel of fruit. They are red.', 'There is a peck of pears that taste sweet.', 'We have a box of plums.', 'This is bag of green apples.', 'We have apples and pears']})
df2 = pd.DataFrame({'Fruits': ['plum', 'pear', 'apple', 'orange']})
# display(df1)
Summary
0 This is a basket of red apples. They are sour.
1 We found a bushel of fruit. They are red.
2 There is a peck of pears that taste sweet.
3 We have a box of plums.
4 This is bag of green apples.
5 We have apples and pears
# set all values to lowercase in Fruits
df2.Fruits = df2.Fruits.str.lower()
# create an array of unique Fruits from df2
unique_fruits = df2.Fruits.unique()
# for each sentence check if a fruit is in the sentence and create a list
df1['Fruits'] = df1.Summary.str.lower().apply(lambda x: [v for v in unique_fruits if v in x])
# explode the lists into separate rows; if sentences contain more than one fruit, there will be more than one row
df1 = df1.explode('Fruits').reset_index(drop=True)
# merge df1 to df2
df2_ = df2.merge(df1, on='Fruits', how='left')
# groupby fruit, into a string
df2_ = df2_.groupby('Fruits').Summary.agg(list).str.join(', ').reset_index()
# display(df2_)
Fruits Summary
0 apple This is a basket of red apples. They are sour., This is bag of green apples., We have apples and pears
1 orange NaN
2 pear There is a peck of pears that taste sweet., We have apples and pears
3 plum We have a box of plums.
df2['Summary'] = df2.Fruits.str.lower().apply(lambda x: ', '.join([v for v in df1.Summary if x in v.lower()]))