通过具有误差范围的测量值连接数据帧

时间:2016-11-22 22:35:41

标签: r dplyr

我正在寻找一种方法来连接(或可能合并)R中的两个或多个数据帧,其中包含具有指定误差范围的测量值。这意味着“by”列中的值将为nnn.nnnn +/- 0.000n。误差容限限制为该值的3 e-6倍。

这是我迄今为止最好的尝试。

newDF&lt; - left_join(P0511_480k,P0511_SF00V,by = c(P0511_480k $ mz ==(P0511_SF00V $ mz - 0.000003(P0511_480k $ mz)):( P0511_SF00V $ mz + 0.000003(P0511_480k $ mz))))< / p>

在这个表达式中,我有两个数据帧(P0511_480k和P0511_SF00V),我想用名为“m.z”的列合并它们。可接受的值范围是正或负“m.z”乘以0.000003。例如,P0511_480k_subset $ m.z = 187.06162应匹配P0511_SF00V_subset $ m.z = 187.06155。

> dput(head(P0511_480k_subset, 10))
structure(list(m.z = c(187.06162, 203.05652, 215.05668, 217.07224, 
279.05499), Intensity = c(319420.8, 288068.9, 229953, 210107.8, 
180054), Relative = c(100, 90.18, 71.99, 65.78, 56.37), Resolution = c(394956.59, 
415308.31, 387924.91, 437318.31, 410670.91), Baseline = c(2.1, 
1.43, 1.69, 1.73, 3.04), Noise = c(28.03, 27.17, 27.52, 27.58, 
29.37)), .Names = c("m.z", "Intensity", "Relative", "Resolution", 
"Baseline", "Noise"), class = c("tbl_df", "data.frame"), row.names = c(NA, 
-5L))

> dput(head(P0511_SF00V_subset, 10))
structure(list(m.z = c(187.06155, 203.05641, 215.05654, 217.0721
), Intensity = c(1021342.8, 801347.1, 662928.1, 523234.2), Relative = c(100, 
78.46, 64.91, 51.23), Resolution = c(314271.88, 298427.41, 289803.97, 
288163.63), Baseline = c(6.89, 10.47, 9.13, 8.89), Noise = c(40.94, 
45.98, 44.3, 44.01)), .Names = c("m.z", "Intensity", "Relative", 
"Resolution", "Baseline", "Noise"), class = c("tbl_df", "data.frame"
), row.names = c(NA, -4L))

感谢您的建议!我已经尽可能广泛地搜索了帮助文档,但我找不到一个接近我需要的示例。

非常感谢!

1 个答案:

答案 0 :(得分:0)

如果您不需要不匹配的行,那么这可以工作。假设两个数据集是df1和df2。查看df1中的m.z列,如果它在df2的m.z列中的任何值的0.000003容差范围内,则将df1中的该值替换为df2中的相应匹配值。然后合并两个数据帧。

df1$m.z <- sapply(df1$m.z, function(x)
                 {
                  # First check if the element lies within tolerance limits of any element in df2
                  ifelse(min(abs(df2$m.z - x), na.rm=TRUE) < 0.000003 * x,
                  # If yes, replace that element in df1 with the matching element in df2
                   df2[which.min(abs(df2$m.z - x)),"m.z"], 0)
                 })
df3 <- merge(df1, df2)