我有这样的数据集:
CASE_ID = c("C1","C1", "C2","C2", "C2", "C3", "C4")
PERSON_ID = c(1,0,7,8,1,20,7)
PERSON_DIVISION = c("Zone 1", "NA", "Zone 1", "Zone 3", "Zone 1", "Zone 5", "Zone 1")
df <- data.frame(CASE_ID, PERSON_ID, PERSON_DIVISION)
df
结果是:
CASE_ID PERSON_ID PERSON_DIVISION
1 C1 1 Zone 1
2 C1 0 NA
3 C2 7 Zone 1
4 C2 8 Zone 3
5 C2 1 Zone 1
6 C3 20 Zone 5
7 C4 7 Zone 1
我想改变它:
CASE_ID P1_ID P2_ID P3_ID P1_Division P2_Division P3_Division
1 1 0 NA Zone 1 NA NA
2 7 8 1 Zone 1 Zone 3 Zone 1
3 20 NA NA Zone 5 NA NA
4 7 NA NA Zone 1 NA NA
到目前为止,我的方法是融化数据并使用Dcast:
e <- melt(df)
dcast(e, CASE_ID ~ PERSON_DIVISION + variable)
但我没有得到所需的输出,而是我得到了:
CASE_ID NA_PERSON_ID Zone 1_PERSON_ID Zone 3_PERSON_ID Zone 5_PERSON_ID
1 C1 1 1 0 0
2 C2 0 2 1 0
3 C3 0 0 0 1
4 C4 0 1 0 0
答案 0 :(得分:1)
这里有两个问题:
data.table
支持dcast()
中的多个值变量。dcast()
将尝试聚合重复项(默认使用length()
来解释您已获得的输出)。请尝试
library(data.table) # version 1.10.4 used here
# coerce to data.table, add unique row numbers for each group
setDT(df)[, rn := rowid(CASE_ID)]
# dcast with multiple value vars
dcast(df, CASE_ID ~ rn, value.var = list("PERSON_ID", "PERSON_DIVISION"))
# CASE_ID PERSON_ID_1 PERSON_ID_2 PERSON_ID_3 PERSON_DIVISION_1 PERSON_DIVISION_2 PERSON_DIVISION_3
#1: C1 1 0 NA Zone 1 NA NA
#2: C2 7 8 1 Zone 1 Zone 3 Zone 1
#3: C3 20 NA NA Zone 5 NA NA
#4: C4 7 NA NA Zone 1 NA NA
这可以更简洁地写成一行:
dcast(setDT(df), CASE_ID ~ rowid(CASE_ID), value.var = list("PERSON_ID", "PERSON_DIVISION"))