如何获得类别变量的百分比和单个选择的整体百分比

时间:2020-06-02 11:42:27

标签: r

基本上我有如下新数据,我希望每个省中的是百分比,同样地,每个省都希望有百分比,我希望是和否的总体百分比是

new_data <-data.frame(province=c("a","b"),food=c("yes","no","no","yes","yes","no"),shelter_type=c("unfinished","permanent","transitional"))

我想输出如下

out_put <- data.frame (province=c("a","b","overall_perc"),food_yes_per=c(66.6,36.4,50),food_No_per=c(36.4,66.6,50),shelter_type_unfinished=c(50,50,33.3),shelter_type_permanent=c(50,50,33.3),shelter_type_transitional=c(50,50,33.3))

任何人都可以帮忙

1 个答案:

答案 0 :(得分:1)

更新的答案

此问题的棘手部分是数据中表示的行百分比和列百分比之间的差异。由于除总行外的所有行都是列百分比,因此我们将需要对数据进行两次处理,首先是province * variable聚合级别,然后是variable聚合到{{ 1}}。

province

首先,我们将生成在宽格式数据框中最终变为列百分比的内容。我们使用new_data <-data.frame(province=c("a","b"), food=c("yes","no","no","yes","yes","no"), shelter_type=c("unfinished","permanent","transitional")) library(dplyr) library(tidyr) 创建窄格式的整洁数据集,创建计数,pivot_longer()计数,然后使用summarise()变量和值生成列百分比。

group_by()

接下来,我们重新汇总数据以创建将成为new_data %>% group_by(province) %>% pivot_longer(.,c(food,shelter_type),names_to = "variable", values_to = "value") %>% ungroup() %>% group_by(province,variable,value) %>% mutate(count = 1) %>% summarise(.,count = sum(count)) %>% ungroup() %>% group_by(variable,value) %>% mutate(pct = count / sum(count)) -> prov_var 省的地区。我们获取原始数据,将其转换为窄格式整洁数据,这次使用Total变量和值来计算group_by()中的百分比。

province

最后,我们new_data %>% group_by(province) %>% pivot_longer(.,c(food,shelter_type),names_to = "variable", values_to = "value") %>% ungroup() %>% group_by(variable,value) %>% mutate(count = 1) %>% summarise(., count = sum(count)) %>% mutate(province = "Total", pct = count / sum(count)) -> tot_var 数据并使用rbind()来创建宽格式数据帧,如原始问题所示。

tidyr::pivot_wider()

...以及输出:

# now add rows & pivot_wider()
rbind(prov_var,tot_var) %>% 
     mutate(concat_var = paste(variable,value,sep="_")) %>% 
     select(-variable,-value,-count) %>% 
     pivot_wider(id_cols = province,names_from=concat_var,
                 values_from = pct)

使用# A tibble: 3 x 6 province food_no food_yes shelter_type_perm… shelter_type_tra… shelter_type_unf… <chr> <dbl> <dbl> <dbl> <dbl> <dbl> 1 a 0.333 0.667 0.5 0.5 0.5 2 b 0.667 0.333 0.5 0.5 0.5 3 Total 0.5 0.5 0.333 0.333 0.333 的部分解决方案

尝试回答问题的另一种方法是使用tables::tabular()软件包。我们可以通过tables生成列百分比,如下所示。

province

不幸的是,总计行不是所要求的。

library(tables)

# replicate column percentages, where "All" is 100

tabular((Factor(province,"Province") + 1) ~ 
                (Factor(food) + Factor(shelter_type)) * 
                (Percent("col")),data = new_data )

我们可以通过为表配置行百分比来修复 food shelter_type no yes permanent transitional unfinished Province Percent Percent Percent Percent Percent a 33.33 66.67 50 50 50 b 66.67 33.33 50 50 50 All 100.00 100.00 100 100 100 行,但是按省份划分的数据与请求的数据不匹配。

All

使用# replicate row percentages in All row tabular((Factor(province,"Province") + 1) ~ (Factor(food) + Factor(shelter_type)) * (Percent("row")),data = new_data ) food shelter_type no yes permanent transitional unfinished Province Percent Percent Percent Percent Percent a 33.33 66.67 33.33 33.33 33.33 b 66.67 33.33 33.33 33.33 33.33 All 50.00 50.00 33.33 33.33 33.33

纠正解决方案

但是,如果我们通过在表的行维度而不是列维度上指定百分比来控制百分比,则可以实现所需的输出。

tabular()

...以及输出:

tabular((Factor(province,"Province")*( colPct = Percent("col")) + 1*(rowPct = Percent("row")))  ~ 
                (Factor(food) + Factor(shelter_type)),data = new_data )

原始答案

我们将使用 food shelter_type Province no yes permanent transitional unfinished a colPct 33.33 66.67 50.00 50.00 50.00 b colPct 66.67 33.33 50.00 50.00 50.00 All rowPct 50.00 50.00 33.33 33.33 33.33 包按省和食品汇总数据,计算百分比,然后使用dplyr计算总响应百分比。

ungroup()

...以及输出:

new_data <-data.frame(province=c("a","b"),
                      food=c("yes","no","no","yes","yes","no"),
                      shelter_type=c("unfinished","permanent","transitional"))

library(dplyr)

new_data %>% group_by(province,food) %>%
     summarise(count_food = n()) %>% group_by(province) %>%
     mutate(pct_food = count_food / sum(count_food)) %>%
     ungroup(.) %>%
     mutate(pct_total = count_food / sum(count_food))