Question

我有以下两个数据框：

c_pd = pd.DataFrame({'Name': ['Geeks', 'Peter', 'James', 'Jack', 'Lisa'], 
               'england': ['fish and chips','cheese hamburger, roast beef, whatever','pizza peperoni, pizza marinera, wine','steak with french fries','potetos with tomato'], 
               'france': ['voiture, maison, petit dejeuner','voiture blanc, grand maison, whatever','ratatouille, vin','fromage','petit fromage']}) 

a_df = pd.DataFrame({'groups1':[['fromage blanc grand', 'petit dejeuner'],['la vache qui rie']],'groups2':[['Coq au vin','coq a la bier'],['cannard, vin']]}, index=['paragraph1','paragraph2'])

直观地看到最终结果的DF图片

我想在第二个DF b_df中添加一列

land='france'
a_df['groups_1_dishes']=a_df['groups_1'].apply(lambda x: f(x,c_pd,land))

我想应用执行以下操作的函数。我想为group_1列中的每个元素创建一个包含c_pd名称列表的列，该列符合c_pd名称列表，该条件符合group_1元素中任何ANY的所有单词（逗号=分隔符）都包含在相应的Name中地对。

例如：对于第1组的第一个元素，我们在列表中有两个元素：“白奶酪”和“红牛排”。表1的任何人都访问了Englang和法国。并列出了他最喜欢的食物/菜肴。我必须找出“英格兰”一栏中是否存在“白色”和“奶酪”两个词。是吗是的，他们是PETER，有红色牛排，JACK是，白色奶酪。因此，第一个结果是列表[Peter，Jack] 让我们做第二个例子。在a_df.loc [['paragraph2']，['groups1']]中，有[牛排，色拉，苹果香蕉]，所以STEAK在彼得列表中（请记住，对于a_df中的每个元素，字符串中任何位置的所有单词英国）。 SALAD不在，'APPLE'和'BANANA'都不在。结果就是[Peter]

，该功能将接管土地，因为我想对法国做同样的事情，对groups2和两个土地等等都如此。

期望的输出请参阅新列中添加的a_df ['group_1_names']

到目前为止，我已经尝试过：

def f(list_of_disches,c_pd, land):
        
    # for instance the list_of_dishes would be here: ['fromage blanc grand', 'petit dejeuner']
    
    names_found=[]
    
    for dish in list_of_disches:
        # if all the words of sintagma are in concept add concept
        dish_words=dish.split() # for instance it would be ['fromage' 'blanc' 'grand']
        
        mask = c_pd[land].apply(any(lambda x: word in x for word in dish_words))
        
        names = c_pd[mask]['name'].tolist()

        names_found.append(names)

我认为我并不遥远，但我无法应付

添加编辑：预期输出

使用其他数据框列表的值过滤熊猫数据框

0 个答案: