从长到宽格式略有不同。 (请不要报告重复)
我有如下数据。我想根据术语列与主题列中的相应值进行转置。结果将类似于df_result:
DF <- data.frame(ID = c("10", "10", "10", "10", "10", "11", "11", "11", "12", "12"),
term = c("1", "1", "2", "2", "3", "1", "1", "2", "1", "1"),
subject = c("math1", "phys1", "math2", "chem1", "cmp1", "math1", "phys1", "math2", "math1", "phys1"),
graduation = c ("grad", "grad", "grad", "grad", "grad", "drop", "drop", "drop", "enrolled", "enrolled"))
Df
ID term subject graduation
10 1 math1 grad
10 1 phys1 grad
10 2 math2 grad
10 2 chem1 grad
10 3 cmp1 grad
11 1 math1 drop
11 1 phys1 drop
11 2 math2 drop
12 1 math1 enrolled
12 1 phys1 enrolled
Df_result:
ID term1 term2 term3 graduation
10 math1 math2 cmp1 grad
10 phys1 chem1 NA grad
11 math1 math2 NA drop
11 phys1 NA NA drop
12 math1 NA NA Enrolled
12 math2 NA NA Enrolled
使用reshape
会产生接近我想要的效果,但是只会保留第一个匹配项。
resjape(DF, idvar = c("ID","graduation"), timevar = "term", direction = "wide")
它产生:
ID graduation subject.1 subject.2 subject.3
1 10 grad math1 math2 cmp1
6 11 drop math1 math2 <NA>
9 12 enrolled math1 <NA> <NA>
问题是timevar
仅保留第一个匹配项。
使用dcast
和melt
仅使用功能length
填充数据。
如何在R中解决它?
答案 0 :(得分:2)
这与从长到宽的重塑相同,但是您需要一个新变量来帮助您唯一标识新格式的行。我在下面将此变量称为classnum
,并使用data.table
的语法来帮助我创建它:
# add helper variable "classnum"
library(data.table)
setDT(DF)
DF[ , classnum := 1:.N, by=.(ID, term)]
#reshape long-to-wide
tidyr::spread(DF, term, subject)
结果:
ID graduation classnum 1 2 3
1: 10 grad 1 math1 math2 cmp1
2: 10 grad 2 phys1 chem1 <NA>
3: 11 drop 1 math1 math2 <NA>
4: 11 drop 2 phys1 <NA> <NA>
5: 12 enrolled 1 math1 <NA> <NA>
6: 12 enrolled 2 phys1 <NA> <NA>