我想找到一个优雅的方法:
我有两个数据帧:
第一个data.frame:
zone = c("A", "B", "C")
country_name = c("Canada and UK", "UK and USA", "USA and Canada and UK")
df1 = data.frame(zone, country_name)
第二个data.frame:
zone_area = c("A", "A", "A", "B", "B", "B", "C", "C", "C")
country_name = c("Canada", "UK", "USA", "Canada", "UK", "USA", "Canada", "UK", "USA")
cost = c(4, 8, 6, 5, 6, 9, 8, 7, 5)
df2 = data.frame(zone_area, country_name, cost)
最终生成的data.frame 应该像df3:
zone = c("A", "B", "C")
country_name = c("Canada and UK", "UK and USA", "USA and Canada and UK")
cost = c(12, 15, 20)
df3 = data.frame(zone, country_name, cost)
我需要使用for循环的原因是因为如果使用不同的zone值,代码应该可以工作。
感谢所有观看此问题的人,并提供了一种方法:)
答案 0 :(得分:1)
分割' country_name'后,我们可以left_join
按'和'按'区'分组,得到sum
'费用'并使用原始数据集执行right_join
以获得预期输出
library(tidyverse)
df1 %>%
separate_rows(country_name, sep="\\s+and\\s+") %>%
left_join(df2) %>%
group_by(zone) %>%
summarise(cost = sum(cost)) %>%
right_join(df1) %>%
select(zone, country_name, cost)
# A tibble: 3 x 3
# zone country_name cost
# <fct> <fct> <dbl>
#1 A Canada and UK 12
#2 B UK and USA 15
#3 C USA and Canada and UK 20
或者,不是使用separate_rows
,而是根据&#39; country_name&#39;中的模式执行left_join
然后filter
,获取sum
的{{1}} &#39;成本&#39;和right_join
与&#39; df1&#39;
left_join(df2, df1, by = "zone") %>%
group_by(zone) %>%
filter(grepl(gsub("\\s*and\\s*", "|", country_name.y[1]), country_name.x)) %>%
summarise(cost = sum(cost)) %>%
right_join(df1)