问题:我有一个包含100个问题的调查。这些可能具有5种类型的响应,我将这些响应分组并统计到单独的表中(在列表中)。每个表具有不同数量的列,具有不同的变量名。
样品数据:
tbl1 <- tribble(~"stakeholder", ~"Question", ~"1-Little", ~"2", ~"3", ~"4-Much", ~"Do not know/ Not applicable", ~"no_response",
"SH_1", "QUESTION 2", 2, 1, 4, 8, 1, 1,
"SH_2", "QUESTION 2", 2, 1, 4, 8, 1, 1,
"SH_3", "QUESTION 2", 2, 1, 4, 8, 1, 1,
"SH_4", "QUESTION 2", 2, 1, 4, 8, 1, 1,
)
tbl2 <- tribble(~"stakeholder", ~"Question", ~"1-Little", ~"2", ~"3", ~"4-Much", ~"5-MuchMuch", ~"Do not know/ Not applicable", ~"no_response",
"SH_1", "QUESTION 2", 2, 1, 4, 8, 1, 1,2,
"SH_2", "QUESTION 2", 2, 1, 4, 8, 1, 1,2,
"SH_3", "QUESTION 2", 2, 1, 4, 8, 1, 1,2,
"SH_4", "QUESTION 2", 2, 1, 4, 8, 1, 1,2
)
问题:如何基于总和创建比例计数? 我需要根据每个问题的回答总数创建比例表。
我基于分组变量,通过字符响应在样本表中创建以上计数。我注意到,我将通过6种不同的方式对图形和表格进行分组和再现(总共需要近600种!):
tally_function <- function(tbl) {
tbl %>%
gather(key = Question, value = Response,
12:length(.)) %>%
group_by(stakeholder, Question, Response) %>%
tally %>%
spread(Response, n, fill = 0) %>%
select(stakeholder, Question, everything(), no_response = `<NA>`) %>%
arrange(Question)
}
我以前使用的函数调用各个列名来产生总和,但这在这里不起作用,因为每个表中的列名都不同:
Prop_Function_Group1 <- function(tbl){
tbl %>%
summarise(`Number of Responses (Count)` = sum(`1-Little` + `2`+`Do not know/ Not applicable`+
`3`+`4-Much` + no_response, na.rm = TRUE),
`1-Little`= sum(`1-Little`/`Number of Responses (Count)`, na.rm = TRUE) * 100,
`2` = sum(`2` / `Number of Responses (Count)`, na.rm = TRUE) * 100,
`Do not know/ Not applicable` = sum(`Do not know/ Not applicable` / `Number of Responses (Count)`, na.rm = TRUE)* 100,
`3` = sum(`3` / `Number of Responses (Count)`, na.rm = TRUE) * 100,
`4-Much` = sum(`4-Much` / `Number of Responses (Count)`, na.rm = TRUE) * 100,
`no_response` = sum(no_response / `Number of Responses (Count)`, na.rm = TRUE) * 100
) %>%
mutate_if(is.numeric, round, digits = 2) %>%
arrange(desc(`Number of Responses (Count)`))
}
当前,我有这个,但是相信我将需要基于names(tbl)的某种ifelse / case_when()循环,但是在编程上确实是新手,并且不确定从哪里开始。 summary函数中的col名称必须与它们正在汇总的输入表的名称相同。
prop_function <- function(tbl){
tbl %>%
summarise(`Number of Responses` = sum(3:length(.), na.rm = TRUE))
}
我不需要一个完整的解决方案,任何小的想法和贡献都会有所帮助。如果这是重复的问题类型,请按照正确的方向进行指导。
此后,我还将它们输入purr :: map()+ ggplot()中,因此,请注意解决方案是否对tidyverse友好。
干杯。
答案 0 :(得分:0)
这是一个继续使用dplyr / tidyverse并镜像来自Prop_Function_Group1(tbl1)
的输出格式/结构的解决方案。但是,该功能应该可以应用于您描述的其他表格形式。
library(tidyverse)
prop_function <- function(tbl){
tbl_counts <- tbl %>%
summarise_if(is.double, ~sum(.x))
tbl_counts %>%
mutate_all(~100 * .x / sum(tbl_counts)) %>%
mutate(`Number of Responses (Count)` = sum(tbl_counts)) %>%
mutate_all(round, digits = 2) %>%
select(length(.), everything()) # move last col to first
}
list(tbl1, tbl2) %>%
map(prop_function)
#> [[1]]
#> # A tibble: 1 x 7
#> `Number of Resp~ `1-Little` `2` `3` `4-Much` `Do not know/ N~
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 68 11.8 5.88 23.5 47.1 5.88
#> # ... with 1 more variable: no_response <dbl>
#>
#> [[2]]
#> # A tibble: 1 x 8
#> `Number of Resp~ `1-Little` `2` `3` `4-Much` `5-MuchMuch`
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 76 10.5 5.26 21.0 42.1 5.26
#> # ... with 2 more variables: `Do not know/ Not applicable` <dbl>,
#> # no_response <dbl>
由reprex package(v0.2.1)于2019-01-10创建
答案 1 :(得分:0)
上面@ bryan-shalloway给出的答案使我步入正轨-此处所做的主要更改是,此版本通过在mutate()中嵌套map()操作来维护分组变量名称:
proportion_function <- function(tbl){
tbl_counts <- tbl %>%
gather(key = Question, value = Response,
12:length(.)) %>%
group_by(Region, Question, Response) %>%
tally %>%
spread(Response, n, fill = 0) %>%
select(Region, Question, everything(), no_response = `<NA>`) %>%
arrange(Question)
tbl_counts %>%
nest() %>%
mutate(data = map(data, ~ .x %>% select_if(is.numeric)
%>% mutate(count = sum(rowSums(.))))) %>%
mutate(data = map(data, ~ .x %>% select_if(is.numeric)
%>% mutate_all(funs((. / count) * 100 )))) %>%
mutate(data = map(data, ~ .x %>% select_if(is.numeric)
%>% mutate_all(round, digits = 2))) %>%
unnest()
}