以编程方式计算R数据帧中单个列中的多个选项条目

时间:2017-08-15 16:00:47

标签: r dplyr tidyverse

调查数据通常包含多个选项列,其中的条目以逗号分隔,例如:

library("tidyverse")
my_survey <- tibble(
  id = 1:5,
  question.1 = 1:5,
  question.2 = c("Bus", "Bus, Walk, Cycle", "Cycle", "Bus, Cycle", "Walk")
)

我们希望有一个函数multiple_choice_tally来计算问题的唯一答案:

my_survey %>%
  multiple_choice_tally(question = question.2)
### OUTPUT:
# A tibble: 3 x 2
  response count
     <chr> <int>
1      Bus     3
2     Walk     2
3    Cycle     3

在没有任何硬编码的情况下构建multiple_choice_tally的最有效和最灵活的方法是什么。

2 个答案:

答案 0 :(得分:3)

我们可以使用separate_rows包中的tidyr来展开question.2中的内容。由于您使用的是tidyversetidyr已经加载了library("tidyverse"),我们无需再次加载它。 my_survey2是最终输出。

my_survey2 <- my_survey %>%
  separate_rows(question.2) %>%
  count(question.2) %>%
  rename(response = question.2, count = n)

my_survey2
# A tibble: 3 × 2
  response count
     <chr> <int>
1      Bus     3
2    Cycle     3
3     Walk     2

更新:设计功能

我们可以将上面的代码转换成如下函数。

multiple_choice_tally <- function(survey.data, question){
  question <- enquo(question)
  survey.data2 <- survey.data %>%
    separate_rows(!!question) %>%
    count(!!question) %>%
    setNames(., c("response", "count"))
  return(survey.data2)
}

my_survey %>%
  multiple_choice_tally(question = question.2)
# A tibble: 3 x 2
  response count
     <chr> <int>
1      Bus     3
2    Cycle     3
3     Walk     2

答案 1 :(得分:0)

我目前解决此问题的方法如下:

multiple_choice_tally <- function(survey.data, question) {
  ## Require a sym for the RHS of !!response := if_else
  question_as_quo <- enquo(question)
  question_as_string <- quo_name(question_as_quo)
  target_question <- rlang::sym(question_as_string)

  ## Collate unique responses to the question
  unique_responses <- survey.data %>%
    select(!!target_question) %>%
    na.omit() %>%
    .[[1]] %>%
    strsplit(",") %>%
    unlist() %>%
    trimws() %>%
    unique()

  ## Extract responses to question
  question_tally <- survey.data %>%
    select(!!target_question) %>%
    na.omit()

  ## Iteratively create a column for each unique response
  invisible(lapply(unique_responses,
                   function(response) {
                     question_tally <<- question_tally %>%
                       mutate(!!response := if_else(str_detect(!!target_question, response), TRUE, FALSE))

                   }))

  ## Gather into tidy form
  question_tally %>%
    summarise_if(is.logical, funs(sum)) %>%
    gather(response, value = count)

}

然后可以按如下方式使用:

library("tidyverse")
library("rlang")
library("stringr")
my_survey <- tibble(
  id = 1:5,
  question.1 = 1:5,
  question.2 = c("Bus", "Bus, Walk, Cycle", "Cycle", "Bus, Cycle", "Walk")
)

my_survey %>%
  multiple_choice_tally(question = question.2)
### OUTPUT:
# A tibble: 3 x 2
  response count
     <chr> <int>
1      Bus     3
2     Walk     2
3    Cycle     3