在R中将两个因子转换为长格式到宽格式

时间:2015-12-22 16:23:31

标签: r reshape

我有以下数据集:

sample.data <- data.frame(Step = c(1,2,3,4,1,2,1,2,3,1,1),
                          Case = c(1,1,1,1,2,2,3,3,3,4,5),
                          Decision = c("Referred","Referred","Referred","Approved","Referred","Declined","Referred","Referred","Declined","Approved","Declined"),
                          Reason = c("Docs","Slip","Docs","","Docs","","Slip","Docs","","",""))

sample.data

      Step Case Decision Reason
1     1    1    Referred Docs
2     2    1    Referred Slip
3     3    1    Referred Docs
4     4    1    Approved
5     1    2    Referred Docs
6     2    2    Declined
7     1    3    Referred Slip
8     2    3    Referred Docs
9     3    3    Declined
10    1    4    Approved
11    1    5    Declined

在R中是否可以将其转换为宽表格格式,并在标题上做出决定,每个单元格的值为出现次数,例如:

Case    Referred    Approved    Declined    Docs     Slip
 1          3           1           0        2        0
 2          1           0           1        1        0
 3          2           0           1        1        1
 4          0           1           0        0        0
 5          0           0           1        0        0

3 个答案:

答案 0 :(得分:3)

draw()

答案 1 :(得分:2)

我们可以使用gather/spread

中的tidyr
 library(tidyr)
 library(dplyr)
 gather(sample.data, Var, Val, 3:4) %>%
           group_by(Case, Val) %>% 
           summarise(n=n()) %>%
           filter(Val!='') %>% 
           spread(Val, n, fill=0)

#   Case Approved Declined  Docs Referred  Slip
#   (dbl)    (dbl)    (dbl) (dbl)    (dbl) (dbl)
#1     1        1        0     2        3     1
#2     2        0        1     1        1     0
#3     3        0        1     1        2     1
#4     4        1        0     0        0     0
#5     5        0        1     0        0     0

答案 2 :(得分:2)

使用:

library(reshape2)
tmp <- melt(sample.data, id.var=c("Step", "Case"))
tmp <- tmp[tmp$value!="",]

dcast(tmp, Case ~ value, value.var="Case", length)

你得到:

   Case Approved Declined Docs Referred Slip
1:    1        1        0    2        3    1
2:    2        0        1    1        1    0
3:    3        0        1    1        2    1
4:    4        1        0    0        0    0
5:    5        0        1    0        0    0

使用 data.table -package,您可以使用与 reshape2 相同的meltdcast功能,但不要需要一个临时数据帧:

library(data.table)
dcast(melt(setDT(sample.data), id.var=c("Step", "Case"))[value!=""],
      Case ~ value, value.var="Case", length)

会给你相同的结果。