我正在处理一个数据帧,我正在使用group_by并汇总以使用dplyr获得一些结果。但是,我打算在汇总时生成的变量之一需要根据分组变量的值访问第二个数据帧值,我无法猜测如何做到这一点。 这是一个例子。
这是我的2 df:
ExampleData <- structure(list(country = structure(c(5L, 5L, 5L, 1L, 1L, 1L,
4L, 4L, 4L, 2L, 2L, 2L), .Label = c("Bolivia", "Colombia", "Ecuador",
"Peru", "Venezuela"), class = "factor"), area = c(21962759.1957539,
6116515271.82745, 4420526.44962988, 950155731.837125, 3284949253.71748,
13008533744.7177, 181171.153229255, 724458.059924146, 545485754.118267,
646585511.365563, 5586512056.6131, 4025165194.1968)), .Names = c("country",
"area"), row.names = c(0L, 1L, 2L, 87L, 88L, 89L, 117L, 118L,
country.areas <- structure(list(country = c("Bolivia", "Colombia", "Ecuador",
"Peru", "Venezuela"), area = c(1090353, 1141962, 256932, 1296912,
916560.5)), .Names = c("country", "area"), row.names = c(NA,
5L), class = "data.frame")
> head(ExampleData)
country area
0 Venezuela 21962759
1 Venezuela 6116515272
2 Venezuela 4420526
87 Bolivia 950155732
88 Bolivia 3284949254
89 Bolivia 13008533745
> head(country.areas)
country area
1 Bolivia 1090353.0
2 Colombia 1141962.0
3 Ecuador 256932.0
4 Peru 1296912.0
5 Venezuela 916560.5
现在,我希望使用ExampleData,group_by
country
字段和summarise
生成变量PercOfCountry
,这是每个国家/地区的总和区域除以该国的总面积,取自country.areas
。我正在尝试:
by.country <- ExampleData %>%
group_by(country) %>%
summarise(km2.country = sum(area)/1000000,
PercOfCountry = km2.country/country.ares$area[country.areas$country == country])
其中最后一个country
(最后一个单词)想要引用group_by中正在考虑的国家区域,该区域取自df country.areas(例如:玻利维亚的1090353.0)。 km2.country
部分按预期工作......我只想将该值除以该国家的面积,因此我得到一个百分比。
当然,我可以很容易地在下一步做到这一点......但是我正在努力学习dplyr,而且我仍然很难理解group_by
函数的哪些功能看起来很强大。
谢谢!
答案 0 :(得分:3)
应该这样做......
by.country <- ExampleData %>% group_by(country) %>%
summarise(km2.country=sum(area)/1000000) %>%
left_join(country.areas) %>% #note this brings in a new variable also called area
mutate(PercOfCountry=km2.country/area)
by.country
# A tibble: 2 × 4
country km2.country area PercOfCountry
<chr> <dbl> <dbl> <dbl>
1 Bolivia 17243.639 1090353.0 0.01581473
2 Venezuela 6142.899 916560.5 0.00670212