调查数据通常包含多个选项列,其中的条目以逗号分隔,例如:
library("tidyverse")
my_survey <- tibble(
id = 1:5,
question.1 = 1:5,
question.2 = c("Bus", "Bus, Walk, Cycle", "Cycle", "Bus, Cycle", "Walk")
)
我们希望有一个函数multiple_choice_tally
来计算问题的唯一答案:
my_survey %>%
multiple_choice_tally(question = question.2)
### OUTPUT:
# A tibble: 3 x 2
response count
<chr> <int>
1 Bus 3
2 Walk 2
3 Cycle 3
在没有任何硬编码的情况下构建multiple_choice_tally
的最有效和最灵活的方法是什么。
答案 0 :(得分:3)
我们可以使用separate_rows
包中的tidyr
来展开question.2
中的内容。由于您使用的是tidyverse
,tidyr
已经加载了library("tidyverse")
,我们无需再次加载它。 my_survey2
是最终输出。
my_survey2 <- my_survey %>%
separate_rows(question.2) %>%
count(question.2) %>%
rename(response = question.2, count = n)
my_survey2
# A tibble: 3 × 2
response count
<chr> <int>
1 Bus 3
2 Cycle 3
3 Walk 2
我们可以将上面的代码转换成如下函数。
multiple_choice_tally <- function(survey.data, question){
question <- enquo(question)
survey.data2 <- survey.data %>%
separate_rows(!!question) %>%
count(!!question) %>%
setNames(., c("response", "count"))
return(survey.data2)
}
my_survey %>%
multiple_choice_tally(question = question.2)
# A tibble: 3 x 2
response count
<chr> <int>
1 Bus 3
2 Cycle 3
3 Walk 2
答案 1 :(得分:0)
我目前解决此问题的方法如下:
multiple_choice_tally <- function(survey.data, question) {
## Require a sym for the RHS of !!response := if_else
question_as_quo <- enquo(question)
question_as_string <- quo_name(question_as_quo)
target_question <- rlang::sym(question_as_string)
## Collate unique responses to the question
unique_responses <- survey.data %>%
select(!!target_question) %>%
na.omit() %>%
.[[1]] %>%
strsplit(",") %>%
unlist() %>%
trimws() %>%
unique()
## Extract responses to question
question_tally <- survey.data %>%
select(!!target_question) %>%
na.omit()
## Iteratively create a column for each unique response
invisible(lapply(unique_responses,
function(response) {
question_tally <<- question_tally %>%
mutate(!!response := if_else(str_detect(!!target_question, response), TRUE, FALSE))
}))
## Gather into tidy form
question_tally %>%
summarise_if(is.logical, funs(sum)) %>%
gather(response, value = count)
}
然后可以按如下方式使用:
library("tidyverse")
library("rlang")
library("stringr")
my_survey <- tibble(
id = 1:5,
question.1 = 1:5,
question.2 = c("Bus", "Bus, Walk, Cycle", "Cycle", "Bus, Cycle", "Walk")
)
my_survey %>%
multiple_choice_tally(question = question.2)
### OUTPUT:
# A tibble: 3 x 2
response count
<chr> <int>
1 Bus 3
2 Walk 2
3 Cycle 3