如何在dplyr

时间:2017-01-13 17:59:03

标签: r dplyr tidyr tidyverse

我正在使用调查数据尝试在一个列中进行多个响应。问题是可能有1-5个答案,用逗号分隔。

我该怎么做:

df <- data.frame(
  splitThis = c("A,B,C","B,C","A,C","A","B","C")
)

> df
  splitThis
1     A,B,C
2       B,C
3       A,C
4         A
5         B
6         C

进入这个:

intoThis <- data.frame(
  A = c(1,0,1,1,0,0),
  B = c(1,1,0,0,1,0),
  c = c(1,1,1,0,0,1)
)

 > intoThis
  A B c
1 1 1 1
2 0 1 1
3 1 0 1
4 1 0 0
5 0 1 0
6 0 0 1

任何争吵帮助表示赞赏!

1 个答案:

答案 0 :(得分:4)

我们可以在mtabulate分割后使用qdapTools中的,

library(qdapTools)
mtabulate(strsplit(as.character(df$splitThis), ","))
#  A B C
#1 1 1 1
#2 0 1 1
#3 1 0 1
#4 1 0 0
#5 0 1 0
#6 0 0 1

OP也提到dplyr/tidyr

library(dplyr)
library(tidyr)
library(tibble)
rownames_to_column(df, "rn") %>% 
          separate_rows(splitThis) %>%
          table()

或使用tidyverse个套件

rownames_to_column(df, "rn") %>%
        separate_rows(splitThis) %>% 
        group_by(rn, splitThis) %>% 
        tally %>% 
        spread(splitThis, n, fill=0) %>%
        ungroup() %>% 
        select(-rn)
# A tibble: 6 × 3
#      A     B     C
#* <dbl> <dbl> <dbl>
#1     1     1     1
#2     0     1     1
#3     1     0     1
#4     1     0     0
#5     0     1     0
#6     0     0     1