生成虚拟数据
MainID=c('A1','A1','B2','C1','C1','C1','D2','D2')
HouseholdID=c('Ab1','Ab1','cb2','Ca2','cb2','cb3','Da1','db2')
relation=c('Spouse','Spouse','Child','Spouse','Child','Mother','Brother','Spouse')
df=data.table(MainID,HouseholdID,relation)
head(df)
MainID HouseholdID relation
1: A1 Ab1 Spouse
2: A1 Ab1 Spouse
3: B2 cb2 Child
4: C1 Ca2 Spouse
5: C1 cb2 Child
6: C1 cb3 Mother
我需要重新整理这些数据,如下所示:
期望的结果
MainID Household1 Relation1 Household2 Relation2 Household3 Relation3
A1 Ab1 Spouse NA NA NA NA
B2 cb2 Child NA NA NA NA
C1 Ca2 Spouse cb2 Child cb3 Mother
D2 Da1 Brother db2 Spouse NA NA
使用dplyr , reshape , tidyverse
或任何其他方法/包执行此操作的最佳方法是什么?
答案 0 :(得分:0)
由于您已经在使用“data.table”,因此您只需获取唯一值,然后添加行指示符变量,最后添加dcast
到宽格式:
library(data.table)
dcast(unique(df)[, ind := rowid(MainID)],
MainID ~ ind, value.var = c("HouseholdID", "relation"))
# MainID HouseholdID_1 HouseholdID_2 HouseholdID_3 relation_1 relation_2 relation_3
# 1: A1 Ab1 NA NA Spouse NA NA
# 2: B2 cb2 NA NA Child NA NA
# 3: C1 Ca2 cb2 cb3 Spouse Child Mother
# 4: D2 Da1 db2 NA Brother Spouse NA