我有一个这样的数据框:
Setting q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam
我希望将第2,3,4,5,6列除以该特定国家/地区的总数。
在基础R中执行它有点笨拙:
df[df$Country=="Vietnam",][c(3, 4, 5, 6)] = df[df$Country=="Vietnam",][c(3, 4, 5, 6)] / sum(df[df$Country=="Vietnam",][c(3, 4, 5, 6)])
(我觉得有效)。
我正在尝试尽可能多地转换我的代码以使用tidyverse函数。有没有办法使用dplyr
来更有效地做同样的事情?
感谢。
答案 0 :(得分:0)
我相信这就是你所追求的:
将每列除以该列的总和 - 按国家/地区
分组library(tidyverse)
df1 %>%
group_by(Country) %>%
mutate_at(vars(c_school: c_leisure), funs(./ sum(.)))
#output
Setting q02_id c_school c_home c_work c_transport c_leisure Country
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 Rural 11900006 NaN 0.238 0.600 1.00 0.250 Vietnam
2 Rural 11900031 1.00 1.00 NaN NaN NaN China
3 Rural 11900033 NaN 0.143 0 0 0.750 Vietnam
4 Rural 11900053 NaN 0.333 0.400 0 0 Vietnam
5 Rural 11900114 1.00 1.00 NaN NaN NaN Malaysia
6 Rural 11900446 NaN 0.286 0 0 0 Vietnam
或者将每列除以每个国家/地区的总和(如示例所示)(唯一的区别是我使用了第3:7列,因为我相信您的意图。
df1 %>%
mutate(sum = rowSums(.[,3:7])) %>%
group_by(Country) %>%
mutate_at(vars(c_school: c_leisure), funs(./ sum(sum))) %>%
select(-sum)
#output
Setting q02_id c_school c_home c_work c_transport c_leisure Country
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 Rural 11900006 0 0.161 0.0968 0.0323 0.0323 Vietnam
2 Rural 11900031 0.667 0.333 0 0 0 China
3 Rural 11900033 0 0.0968 0 0 0.0968 Vietnam
4 Rural 11900053 0 0.226 0.0645 0 0 Vietnam
5 Rural 11900114 0.333 0.667 0 0 0 Malaysia
6 Rural 11900446 0 0.194 0 0 0 Vietnam
数据:
df1 = read.table(text ="Setting q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam", header = T)
答案 1 :(得分:0)
我知道您要求使用tidyverse
函数,但这也是data.table
软件包大放异彩的任务:
library(data.table)
setDT(df)
df[, lapply(.SD, function(x) x / sum(x)), by = Country, .SDcols = 3:7]
Country c_school c_home c_work c_transport c_leisure
1: Vietnam NaN 0.2380952 0.6 1 0.25
2: Vietnam NaN 0.1428571 0.0 0 0.75
3: Vietnam NaN 0.3333333 0.4 0 0.00
4: Vietnam NaN 0.2857143 0.0 0 0.00
5: China 1 1.0000000 NaN NaN NaN
6: Malaysia 1 1.0000000 NaN NaN NaN