Question

问题描述

我使用多选问题进行调查，其中输出在一列中用逗号分隔，以及分组问题（例如，性别）。现在我想把这两个变量交叉列表。

样本数据

我的数据包含2列：

一个多项选择题，调查软件将其作为一列输出，并用逗号分隔选择
分组变量，在本例中为男性或女性

dat <- data.frame(Multiple = c("A,B,C","B","A,C"), Sex = c("M","F","F"))

期望的输出

我想用性别交叉制表多个选择选项（不带逗号）：

Multiple Sex Count
A        M   1
B        M   1
C        M   1
A        F   1
B        F   1
C        F   1

尝试解决方案

这是一个部分解决方案，我只计算多选问题中的元素。我的问题是我不知道如何将分组变量性别包含到此函数中，因为我使用正则表达式来计算逗号分隔向量中的元素：

MSCount <- function(X){

# Function to count values in a comma separated vector

    Answers <- sort(
    unique(
      unlist(
        strsplit(
          as.character(X), ",")))) # Find the possible options from the data alone, e.g. "A", "B" etc.

  Answers <- Answers[-which(Answers == "")] # Drop blank answers

  CountAnswers <- numeric(0) # Initialise the count as an empty numeric list

  for(i in 1:length(Answers)){
    CountAnswers[i] <- sum(grepl(Answers[i],X)) 
  } # Loop round and count the rows with a match for the answer text

  SummaryAnswers <- data.frame(Answers,CountAnswers,PropAnswers = 100*CountAnswers/length(X[!is.na(X)]))
  return(SummaryAnswers)

}

Answer 1

我们可以使用separate_rows

library(tidyverse)
separate_rows(dat, Multiple) %>% 
                   mutate(Count = 1) %>%
                   arrange(Sex, Multiple) %>%
                   select(Multiple, Sex, Count)

如何在R中交叉制表多个选择和单选问题

问题描述

样本数据

期望的输出

尝试解决方案

1 个答案: