R - 基于data.frame中列之间的最小差异进行匹配

时间:2017-11-14 21:14:11

标签: r dataframe match

我有以下两个数据帧:

df1 <- data.frame(Subject=c("S1","S1","S2","S2","S2","S3","S3"), 
       SampleTime=c(1,2,1,2.1,3,0.9,2), Value=c(3,4,3,2,2,4,5))

df2 <- data.frame(Subject=c("S1","S1","S1","S2","S2","S2","S2","S3","S3"),
       SampleTime=c(0.99, 2.01,2.99, 0,1.01,2,3,1.2,2.02), Conc=c(4.7,5.2,8,5,2,1,3,4,6))

我的目标是将列df2 $ Conc添加到df1,其中df1和df2中的SampleTime之间的差异对于每个主题最小。另外,我想添加一个显示SampleTime之间差异的列。

期望的输出:

output <- data.frame(Subject=c("S1","S1","S2","S2","S2","S3","S3"), 
                     SampleTime=c(1,2,1,2.1,3,0.9,2), Value=c(3,4,3,2,2,4,5), 
                     SampleTime_df2=c(0.99,2.01,1.01,2,3,1.20,2.02), Conc=c(4.7,5.2,2,1,3,4,6))

到目前为止,我能够按照主题进行,对于主题S2:

     Indices <- sapply(df1$SampleTime[df1$Subject=="S2"], FUN=function(x,y) which.min(abs(y - x)), y=df2$SampleTime[df2$Subject=="S2"])
     df1$SampleTime_df2[df1$Subject=="S2"] <- df2$SampleTime[df2$Subject=="S2"][Indices]
     df1$Conc[df1$Subject=="S2"] <- df2$Conc[df2$Subject=="S2"][Indices]   

代码看起来不太好,我想立刻为所有主题做。在我的实际数据中,没有任何关系(即df2中的两个采样时间与df1中的一个采样时间最接近),但是我要说在那种情况下我想保留第一个。

我希望我的问题很明确。谢谢你的帮助!

2 个答案:

答案 0 :(得分:2)

一般来说,合并主题上的数据框(内部联接)和填充圆形(SampleTime)的新列是否可行?这种方法适用于您提供的玩具数据,即

df1$SampleTimeInt <- round(df1$SampleTime)
df2$SampleTimeInt <- round(df2$SampleTime)
combined <- merge(df1, df2, by=c("Subject", "SampleTimeInt"))

答案 1 :(得分:2)

我认为这就是你要找的东西?做一个内连接,然后取绝对差,对它进行排序和切片。这都是使用dplyr

require(dplyr)

df3 <- df1 %>% 
         rename(ST1 = SampleTime) %>% 
         inner_join(df2, by = "Subject") %>%
         group_by(Subject, ST1) %>% 
         mutate(diff = abs(ST1 - SampleTime)) %>% 
         arrange(diff) %>% 
         slice(1) %>%
         ungroup()