我有以下两个数据帧:
df1 <- data.frame(Subject=c("S1","S1","S2","S2","S2","S3","S3"),
SampleTime=c(1,2,1,2.1,3,0.9,2), Value=c(3,4,3,2,2,4,5))
df2 <- data.frame(Subject=c("S1","S1","S1","S2","S2","S2","S2","S3","S3"),
SampleTime=c(0.99, 2.01,2.99, 0,1.01,2,3,1.2,2.02), Conc=c(4.7,5.2,8,5,2,1,3,4,6))
我的目标是将列df2 $ Conc添加到df1,其中df1和df2中的SampleTime之间的差异对于每个主题最小。另外,我想添加一个显示SampleTime之间差异的列。
期望的输出:
output <- data.frame(Subject=c("S1","S1","S2","S2","S2","S3","S3"),
SampleTime=c(1,2,1,2.1,3,0.9,2), Value=c(3,4,3,2,2,4,5),
SampleTime_df2=c(0.99,2.01,1.01,2,3,1.20,2.02), Conc=c(4.7,5.2,2,1,3,4,6))
到目前为止,我能够按照主题进行,对于主题S2:
Indices <- sapply(df1$SampleTime[df1$Subject=="S2"], FUN=function(x,y) which.min(abs(y - x)), y=df2$SampleTime[df2$Subject=="S2"])
df1$SampleTime_df2[df1$Subject=="S2"] <- df2$SampleTime[df2$Subject=="S2"][Indices]
df1$Conc[df1$Subject=="S2"] <- df2$Conc[df2$Subject=="S2"][Indices]
代码看起来不太好,我想立刻为所有主题做。在我的实际数据中,没有任何关系(即df2中的两个采样时间与df1中的一个采样时间最接近),但是我要说在那种情况下我想保留第一个。
我希望我的问题很明确。谢谢你的帮助!
答案 0 :(得分:2)
一般来说,合并主题上的数据框(内部联接)和填充圆形(SampleTime)的新列是否可行?这种方法适用于您提供的玩具数据,即
df1$SampleTimeInt <- round(df1$SampleTime)
df2$SampleTimeInt <- round(df2$SampleTime)
combined <- merge(df1, df2, by=c("Subject", "SampleTimeInt"))
答案 1 :(得分:2)
我认为这就是你要找的东西?做一个内连接,然后取绝对差,对它进行排序和切片。这都是使用dplyr
require(dplyr)
df3 <- df1 %>%
rename(ST1 = SampleTime) %>%
inner_join(df2, by = "Subject") %>%
group_by(Subject, ST1) %>%
mutate(diff = abs(ST1 - SampleTime)) %>%
arrange(diff) %>%
slice(1) %>%
ungroup()