基本上我有如下新数据,我希望每个省中的是百分比,同样地,每个省都希望有百分比,我希望是和否的总体百分比是
new_data <-data.frame(province=c("a","b"),food=c("yes","no","no","yes","yes","no"),shelter_type=c("unfinished","permanent","transitional"))
我想输出如下
out_put <- data.frame (province=c("a","b","overall_perc"),food_yes_per=c(66.6,36.4,50),food_No_per=c(36.4,66.6,50),shelter_type_unfinished=c(50,50,33.3),shelter_type_permanent=c(50,50,33.3),shelter_type_transitional=c(50,50,33.3))
任何人都可以帮忙
答案 0 :(得分:1)
此问题的棘手部分是数据中表示的行百分比和列百分比之间的差异。由于除总行外的所有行都是列百分比,因此我们将需要对数据进行两次处理,首先是province
* variable
聚合级别,然后是variable
聚合到{{ 1}}。
province
首先,我们将生成在宽格式数据框中最终变为列百分比的内容。我们使用new_data <-data.frame(province=c("a","b"),
food=c("yes","no","no","yes","yes","no"),
shelter_type=c("unfinished","permanent","transitional"))
library(dplyr)
library(tidyr)
创建窄格式的整洁数据集,创建计数,pivot_longer()
计数,然后使用summarise()
变量和值生成列百分比。
group_by()
接下来,我们重新汇总数据以创建将成为new_data %>% group_by(province) %>%
pivot_longer(.,c(food,shelter_type),names_to = "variable",
values_to = "value") %>% ungroup() %>%
group_by(province,variable,value) %>%
mutate(count = 1) %>% summarise(.,count = sum(count)) %>% ungroup() %>%
group_by(variable,value) %>%
mutate(pct = count / sum(count)) -> prov_var
省的地区。我们获取原始数据,将其转换为窄格式整洁数据,这次使用Total
变量和值来计算group_by()
中的百分比。
province
最后,我们new_data %>% group_by(province) %>%
pivot_longer(.,c(food,shelter_type),names_to = "variable",
values_to = "value") %>% ungroup() %>%
group_by(variable,value) %>%
mutate(count = 1) %>% summarise(., count = sum(count)) %>%
mutate(province = "Total",
pct = count / sum(count)) -> tot_var
数据并使用rbind()
来创建宽格式数据帧,如原始问题所示。
tidyr::pivot_wider()
...以及输出:
# now add rows & pivot_wider()
rbind(prov_var,tot_var) %>%
mutate(concat_var = paste(variable,value,sep="_")) %>%
select(-variable,-value,-count) %>%
pivot_wider(id_cols = province,names_from=concat_var,
values_from = pct)
# A tibble: 3 x 6
province food_no food_yes shelter_type_perm… shelter_type_tra… shelter_type_unf…
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a 0.333 0.667 0.5 0.5 0.5
2 b 0.667 0.333 0.5 0.5 0.5
3 Total 0.5 0.5 0.333 0.333 0.333
的部分解决方案尝试回答问题的另一种方法是使用tables::tabular()
软件包。我们可以通过tables
生成列百分比,如下所示。
province
不幸的是,总计行不是所要求的。
library(tables)
# replicate column percentages, where "All" is 100
tabular((Factor(province,"Province") + 1) ~
(Factor(food) + Factor(shelter_type)) *
(Percent("col")),data = new_data )
我们可以通过为表配置行百分比来修复 food shelter_type
no yes permanent transitional unfinished
Province Percent Percent Percent Percent Percent
a 33.33 66.67 50 50 50
b 66.67 33.33 50 50 50
All 100.00 100.00 100 100 100
行,但是按省份划分的数据与请求的数据不匹配。
All
# replicate row percentages in All row
tabular((Factor(province,"Province") + 1) ~
(Factor(food) + Factor(shelter_type)) *
(Percent("row")),data = new_data )
food shelter_type
no yes permanent transitional unfinished
Province Percent Percent Percent Percent Percent
a 33.33 66.67 33.33 33.33 33.33
b 66.67 33.33 33.33 33.33 33.33
All 50.00 50.00 33.33 33.33 33.33
但是,如果我们通过在表的行维度而不是列维度上指定百分比来控制百分比,则可以实现所需的输出。
tabular()
...以及输出:
tabular((Factor(province,"Province")*( colPct = Percent("col")) + 1*(rowPct = Percent("row"))) ~
(Factor(food) + Factor(shelter_type)),data = new_data )
我们将使用 food shelter_type
Province no yes permanent transitional unfinished
a colPct 33.33 66.67 50.00 50.00 50.00
b colPct 66.67 33.33 50.00 50.00 50.00
All rowPct 50.00 50.00 33.33 33.33 33.33
包按省和食品汇总数据,计算百分比,然后使用dplyr
计算总响应百分比。
ungroup()
...以及输出:
new_data <-data.frame(province=c("a","b"),
food=c("yes","no","no","yes","yes","no"),
shelter_type=c("unfinished","permanent","transitional"))
library(dplyr)
new_data %>% group_by(province,food) %>%
summarise(count_food = n()) %>% group_by(province) %>%
mutate(pct_food = count_food / sum(count_food)) %>%
ungroup(.) %>%
mutate(pct_total = count_food / sum(count_food))