我有一些钻孔地质数据,按表面深度到总深度排序。我希望将几个组合成一个,每个组具有不同的分辨率。最高分辨率数据集具有所需的输出分辨率(它也具有均匀间隔的深度,而其他分辨率不具有)。 我有很多要管理的内容,因此手动电子表格编辑需要花费太长时间。
例如,以下是选定深度范围(约151-152)的一些高分辨率数据:
data <-
structure(list(DEPTH = c(150.876, 151.0284, 151.1808, 151.3332,
151.4856, 151.638, 151.7904, 151.9428, 152.0952, 152.2476), DT = c(435.6977,
437.6732, 441.4934, 444.6542, 445.771, 444.4603, 443.5679, 444.5042,
447.3567, 450.4373), GR = c(13.8393, 14.549, 15.7866, 16.9114,
18.4841, 18.8695, 17.7494, 16.7178, 12.8839, 11.7309)), .Names = c("DEPTH",
"DT", "GR"), row.names = c(NA, -10L), class = "data.frame")
(完整的日志数据文件要大得多,所以我不知道如何在这里设置它供您使用。相反,我已经采取了与下一个数据集中的间隔相匹配的部分; { {1}})
一些低分辨率的离散数值数据,其中深度的范围不等于上面的analyses
数据。该数据表示在特定深度范围内给定长度的采样间隔,并且不随给定范围变化:
logs
一些低分辨率的分类数据,数据的深度范围不等:
analyses <-
structure(list(from = c(151L, 198L, 284L, 480L), to = c(151.1,
198.1, 284.1, 480.1), TC = c(1.276476312, 1.383553608, 1.46771308,
1.125049954), DEN = c(1.842555733, 1.911724824, 1.997592565,
NA), PORO = c(50.21947697, 44.26392579, 39.31309757, NA)), .Names = c("from",
"to", "TC", "DEN", "PORO"), class = "data.frame", row.names = c(NA,
-4L))
预期结果是第一个数据集units <-
structure(list(from = c(0, 100, 450, 535, 617.89), to = c(100,
450, 535, 617.89, 619.25), strat = structure(c(5L, 1L, 2L, 3L,
4L), .Label = c("Formation A", "Formation B",
"Group C", "Group D", "Unassigned"), class = "factor")), .Names = c("from",
"to", "strat"), class = "data.frame", row.names = c(NA, -5L))
的分辨率下的数据,其中包含来自第2个和第3个数据的合并数据。在这种情况下,它将导致此数据框:
logs
我尝试合并数据框,然后使用na.approx填补空白,但问题是DEPTH DT GR TC DEN PORO Unit
150.8760 435.69 13.83 NA NA NA Formation A
151.0284 437.67 14.54 1.27 1.84 50.21 Formation A
151.1808 441.49 15.78 NA NA NA Formation A
151.3332 444.65 16.91 NA NA NA Formation A
151.4856 445.77 18.48 NA NA NA Formation A
151.6380 444.46 18.86 NA NA NA Formation A
151.7904 443.56 17.74 NA NA NA Formation A
151.9428 444.50 16.71 NA NA NA Formation A
152.0952 447.35 12.88 NA NA NA Formation A
152.2476 450.43 11.73 NA NA NA Formation A
中的许多变量都有NaN或NA我不想插值因为 - 他们需要保持为NA。
答案 0 :(得分:1)
您可以使用merge
或sqldf
加入您的data.frames。
library(sqldf)
# If you know that each depth (in the first data.frame)
# is in exactly one interval (in the second and third data.frames)
sqldf( "
SELECT *
FROM data A, analyses B, units C
WHERE B.[from] <= A.DEPTH AND A.DEPTH < B.[to] -- Need to quote some of the column names
AND C.[from] <= A.DEPTH AND A.DEPTH < C.[to]
" )
# If each depth (in the first data.frame)
# is in at most one interval (in the second and third data.frames)
sqldf( "
SELECT *
FROM data A
LEFT JOIN analyses B ON B.[from] <= A.DEPTH AND A.DEPTH < B.[to]
LEFT JOIN units C ON C.[from] <= A.DEPTH AND A.DEPTH < C.[to]
ORDER BY DEPTH
" )