我有以下数据集:
sample.data <- data.frame(Step = c(1,2,3,4,1,2,1,2,3,1,1),
Case = c(1,1,1,1,2,2,3,3,3,4,5),
Decision = c("Referred","Referred","Referred","Approved","Referred","Declined","Referred","Referred","Declined","Approved","Declined"),
Reason = c("Docs","Slip","Docs","","Docs","","Slip","Docs","","",""))
sample.data
Step Case Decision Reason
1 1 1 Referred Docs
2 2 1 Referred Slip
3 3 1 Referred Docs
4 4 1 Approved
5 1 2 Referred Docs
6 2 2 Declined
7 1 3 Referred Slip
8 2 3 Referred Docs
9 3 3 Declined
10 1 4 Approved
11 1 5 Declined
在R中是否可以将其转换为宽表格格式,并在标题上做出决定,每个单元格的值为出现次数,例如:
Case Referred Approved Declined Docs Slip
1 3 1 0 2 0
2 1 0 1 1 0
3 2 0 1 1 1
4 0 1 0 0 0
5 0 0 1 0 0
答案 0 :(得分:3)
draw()
答案 1 :(得分:2)
我们可以使用gather/spread
tidyr
library(tidyr)
library(dplyr)
gather(sample.data, Var, Val, 3:4) %>%
group_by(Case, Val) %>%
summarise(n=n()) %>%
filter(Val!='') %>%
spread(Val, n, fill=0)
# Case Approved Declined Docs Referred Slip
# (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
#1 1 1 0 2 3 1
#2 2 0 1 1 1 0
#3 3 0 1 1 2 1
#4 4 1 0 0 0 0
#5 5 0 1 0 0 0
答案 2 :(得分:2)
使用:
library(reshape2)
tmp <- melt(sample.data, id.var=c("Step", "Case"))
tmp <- tmp[tmp$value!="",]
dcast(tmp, Case ~ value, value.var="Case", length)
你得到:
Case Approved Declined Docs Referred Slip
1: 1 1 0 2 3 1
2: 2 0 1 1 1 0
3: 3 0 1 1 2 1
4: 4 1 0 0 0 0
5: 5 0 1 0 0 0
使用 data.table -package,您可以使用与 reshape2 相同的melt
和dcast
功能,但不要需要一个临时数据帧:
library(data.table)
dcast(melt(setDT(sample.data), id.var=c("Step", "Case"))[value!=""],
Case ~ value, value.var="Case", length)
会给你相同的结果。