将R数据框中的每个逗号分隔的字符串与另一个df中的值进行匹配

时间:2019-06-06 11:45:31

标签: r dataframe

问题:我需要将一个数据框中的逗号分隔字符串与另一个具有某些值的字符串进行匹配。我的最终数据框应将这些值中的最高者放在第一个数据框中。我已经举例说明了。我是R的新手,无法找到构建此代码的逻辑。如果能在入门方面获得帮助,我将不胜感激。

两个DFs

DF1: 
Fruits
Guava
Mango, apple, banana
Apricot, plum
Avocado, Cherry, blueberry, raspberry

DF2:

Fruits Price   
Guava    10 
Mango    30 
Apple    25 
Banana   15              
Apricot  40  
Plum     35   
Avocado  120   
Cherry   23    
Blueberry 200 
Raspberry 125

输出DF3:应该创建一个新列“最高价格”,并将最高价格放置在DF1中的整个水果组中

DF3: 
Fruits Highest Price
Guava    10
Mango, apple, banana 30
Apricot, plum  40
Avocado, Cherry, blueberry, raspberry 200

2 个答案:

答案 0 :(得分:2)

尝试

DF1$`Highest price` = sapply(tolower(DF1$Fruits), 
       function(x){ max(DF2$Price[which(tolower(DF2$Fruits)%in%strsplit(x,", ")[[1]])])})

> DF1
                                 Fruits Highest price
1                                 Guava            10
2                  Mango, apple, banana            30
3                         Apricot, plum            40
4 Avocado, Cherry, blueberry, raspberry           200

罗纳克·沙(Ronak Shah)提出的更短的选择

sapply(strsplit(df1$Fruits, ","), function(x) max(df2$Price[tolower(df2$Fruits) %in% tolower(x)]))

答案 1 :(得分:2)

使用library(tidyverse)的想法可能是将格式分离为长格式,合并并汇总以获取最大值,即

library(tidyverse)

df1 %>% 
 mutate(Fruits = tolower(Fruits), ID = row_number()) %>% 
 separate_rows(Fruits, sep = ',') %>% 
 left_join(df2 %>% mutate(Fruits = tolower(Fruits)), by = 'Fruits') %>% 
 group_by(ID) %>% 
 summarise(Fruits = toString(Fruits), Price = max(Price))

给出,

# A tibble: 4 x 3
     ID Fruits                                Price
  <int> <chr>                                 <dbl>
1     1 guava                                    10
2     2 mango, apple, banana                     30
3     3 apricot, plum                            40
4     4 avocado, cherry, blueberry, raspberry   200