通过合并两个数据帧来解密

时间:2014-11-24 17:08:14

标签: r encryption merge dataframe

我正在尝试将数据重新映射到其加密ID(我无法再访问将加密ID与参与者ID相关联的映射密钥)

就我而言,df1有95K行,dh2 = 94K行,两列都有相同的列(n = 360)。我希望所有列合并df1和df2(具有不同的观察数)。

可重复的例子:

df1 = data.frame(PID=c(1:10),
         Sex = c(rep("male", 4), rep("female", 6)),
         Age=c(rep("35",2), "27" ,rep("28", 2), rep("50",2), rep("55", 1), "66", "54")) 


df2 = data.frame(EID=c("PI_1234", "PI_1235", "PI_1236", "PI_1237", "PI_1238"),
    Sex=c("female", "female", "male", "male", "female"),
    Age=c("28", "50", "28", "27", "66") )


df3 =data.frame(PID=c(5, 7, 4, 3, 9), 
    EID=c("PI_1234", "PI_1235", "PI_1236", "PI_1237", "PI_1238"),
    Sex=c("female", "female", "male", "male", "female"),
    Age=c("28", "50", "28", "27", "66") )

我想创建df3,保持所有匹配的观察结果(将PID与加密ID(EID)映射)。这可能吗?

2 个答案:

答案 0 :(得分:2)

似乎merge函数对此有用

df3 <- merge(df1, df2)

by参数可用于指定要合并的列

df3 <- merge(df1, df2, by = c("Sex", "Age"))

如果您想重新排序列

df3 <- df3[c(3,4,1,2)]

然后通过PID排序(感谢此question

df3[with(df3, order(PID)),]

答案 1 :(得分:2)

这是一个data.table解决方案,在大型数据集上可能比merge(...)更快。

library(data.table)
DT1 <- data.table(df1,key=colnames(df1)[-1])
DT2 <- data.table(df2,key=colnames(df2)[-1])
DT1[DT2,nomatch=0]
#    PID    Sex Age     EID
# 1:   5 female  28 PI_1234
# 2:   6 female  50 PI_1235
# 3:   7 female  50 PI_1235
# 4:   9 female  66 PI_1238
# 5:   3   male  27 PI_1237
# 6:   4   male  28 PI_1236

请注意,结果与df3的结果不同,因为df1有两行female - 50。这些都显示在结果中(应该如此),但不会出现在df3