问题:我需要将一个数据框中的逗号分隔字符串与另一个具有某些值的字符串进行匹配。我的最终数据框应将这些值中的最高者放在第一个数据框中。我已经举例说明了。我是R的新手,无法找到构建此代码的逻辑。如果能在入门方面获得帮助,我将不胜感激。
两个DFs
DF1:
Fruits
Guava
Mango, apple, banana
Apricot, plum
Avocado, Cherry, blueberry, raspberry
DF2:
Fruits Price
Guava 10
Mango 30
Apple 25
Banana 15
Apricot 40
Plum 35
Avocado 120
Cherry 23
Blueberry 200
Raspberry 125
输出DF3:应该创建一个新列“最高价格”,并将最高价格放置在DF1中的整个水果组中
DF3:
Fruits Highest Price
Guava 10
Mango, apple, banana 30
Apricot, plum 40
Avocado, Cherry, blueberry, raspberry 200
答案 0 :(得分:2)
尝试
DF1$`Highest price` = sapply(tolower(DF1$Fruits),
function(x){ max(DF2$Price[which(tolower(DF2$Fruits)%in%strsplit(x,", ")[[1]])])})
> DF1
Fruits Highest price
1 Guava 10
2 Mango, apple, banana 30
3 Apricot, plum 40
4 Avocado, Cherry, blueberry, raspberry 200
罗纳克·沙(Ronak Shah)提出的更短的选择
sapply(strsplit(df1$Fruits, ","), function(x) max(df2$Price[tolower(df2$Fruits) %in% tolower(x)]))
答案 1 :(得分:2)
使用library(tidyverse)
的想法可能是将格式分离为长格式,合并并汇总以获取最大值,即
library(tidyverse)
df1 %>%
mutate(Fruits = tolower(Fruits), ID = row_number()) %>%
separate_rows(Fruits, sep = ',') %>%
left_join(df2 %>% mutate(Fruits = tolower(Fruits)), by = 'Fruits') %>%
group_by(ID) %>%
summarise(Fruits = toString(Fruits), Price = max(Price))
给出,
# A tibble: 4 x 3 ID Fruits Price <int> <chr> <dbl> 1 1 guava 10 2 2 mango, apple, banana 30 3 3 apricot, plum 40 4 4 avocado, cherry, blueberry, raspberry 200