我正在尝试将数据重新映射到其加密ID(我无法再访问将加密ID与参与者ID相关联的映射密钥)
就我而言,df1有95K行,dh2 = 94K行,两列都有相同的列(n = 360)。我希望所有列合并df1和df2(具有不同的观察数)。
可重复的例子:
df1 = data.frame(PID=c(1:10),
Sex = c(rep("male", 4), rep("female", 6)),
Age=c(rep("35",2), "27" ,rep("28", 2), rep("50",2), rep("55", 1), "66", "54"))
df2 = data.frame(EID=c("PI_1234", "PI_1235", "PI_1236", "PI_1237", "PI_1238"),
Sex=c("female", "female", "male", "male", "female"),
Age=c("28", "50", "28", "27", "66") )
df3 =data.frame(PID=c(5, 7, 4, 3, 9),
EID=c("PI_1234", "PI_1235", "PI_1236", "PI_1237", "PI_1238"),
Sex=c("female", "female", "male", "male", "female"),
Age=c("28", "50", "28", "27", "66") )
我想创建df3,保持所有匹配的观察结果(将PID与加密ID(EID)映射)。这可能吗?
答案 0 :(得分:2)
似乎merge
函数对此有用
df3 <- merge(df1, df2)
或by
参数可用于指定要合并的列
df3 <- merge(df1, df2, by = c("Sex", "Age"))
如果您想重新排序列
df3 <- df3[c(3,4,1,2)]
然后通过PID排序(感谢此question)
df3[with(df3, order(PID)),]
答案 1 :(得分:2)
这是一个data.table解决方案,在大型数据集上可能比merge(...)
更快。
library(data.table)
DT1 <- data.table(df1,key=colnames(df1)[-1])
DT2 <- data.table(df2,key=colnames(df2)[-1])
DT1[DT2,nomatch=0]
# PID Sex Age EID
# 1: 5 female 28 PI_1234
# 2: 6 female 50 PI_1235
# 3: 7 female 50 PI_1235
# 4: 9 female 66 PI_1238
# 5: 3 male 27 PI_1237
# 6: 4 male 28 PI_1236
请注意,结果与df3
的结果不同,因为df1
有两行female - 50
。这些都显示在结果中(应该如此),但不会出现在df3
。