在浮点间隔上使用foverlaps时出现意外行为

时间:2014-11-15 22:08:48

标签: r data.table

以下是foverlaps(...)似乎找到不重叠的匹配项的示例。谁能帮我理解我做错了什么?

this post中的问题似乎是在data.table包中使用foverlaps(...)的绝佳机会。以下数据集来自该帖子。

dinosaurs <- structure(list(GENUS = structure(1:3, .Label = c("Abydosaurus", "Achelousaurus", "Acheroraptor"), class = "factor"), ma_max = c(109, 84.9, 70.6), ma_min = c(94.3, 70.6, 66.043), ma_mid = c(101.65, 77.75, 68.3215)), .Names = c("GENUS", "ma_max", "ma_min", "ma_mid"), class = "data.frame", row.names = c(NA, -3L))
stages    <- structure(list(Stage = structure(c(13L, 19L, 17L, 21L, 1L, 4L, 6L, 8L, 16L, 14L, 20L, 7L, 23L, 12L, 5L, 3L, 2L, 10L, 22L, 11L, 18L, 9L, 15L), .Label = c("Aalenian", "Albian", "Aptian", "Bajocian", "Barremian", "Bathonian", "Berriasian", "Callovian", "Campanian", "Cenomanian", "Coniacian", "Hauterivian", "Hettangian", "Kimmeridgian", "Maastrichtian", "Oxfordian", "Pliensbachian", "Santonian", "Sinemurian", "Tithonian", "Toarcian", "Turonian", "Valanginian"), class = "factor"),ma_max = c(201.6, 197, 190, 183, 176, 172, 168, 165, 161, 156, 151, 145.5, 140, 136, 130, 125, 112, 99.6, 93.5, 89.3, 85.8, 83.5, 70.6), ma_min = c(197, 190, 183, 176, 172, 168, 165, 161, 156, 151, 145.5, 140, 136, 130, 125, 112, 99.6, 93.5, 89.3, 85.8, 83.5, 70.6, 66.5), ma_mid = c(199.3, 193.5, 186.5, 179.5, 174, 170, 166.5, 163, 158.5, 153.5, 148.25, 142.75, 138, 133, 127.5, 118.5, 105.8, 96.55, 91.4, 87.55, 84.65, 77.05, 68.05)), .Names = c("Stage", "ma_max", "ma_min", "ma_mid"), class = "data.frame", row.names = c(NA, -23L))
dinosaurs
#           GENUS ma_max ma_min   ma_mid
# 1   Abydosaurus  109.0 94.300 101.6500
# 2 Achelousaurus   84.9 70.600  77.7500
# 3  Acheroraptor   70.6 66.043  68.3215
head(stages)
#           Stage ma_max ma_min ma_mid
# 1    Hettangian  201.6    197  199.3
# 2    Sinemurian  197.0    190  193.5
# 3 Pliensbachian  190.0    183  186.5
# 4      Toarcian  183.0    176  179.5
# 5      Aalenian  176.0    172  174.0
# 6      Bajocian  172.0    168  170.0

目标是找出每个地质阶段存在的恐龙属的数量。

library(data.table)   # 1.9.4
setDT(dinosaurs)[,ma_mid:=NULL]
setDT(stages)[,ma_mid:=NULL]
setkey(dinosaurs,ma_min,ma_max)
foverlaps(stages,dinosaurs,type="any",nomatch=0)
#            GENUS ma_max ma_min         Stage i.ma_max i.ma_min
# 1:   Abydosaurus  109.0 94.300        Albian    112.0     99.6
# 2:   Abydosaurus  109.0 94.300    Cenomanian     99.6     93.5
# 3: Achelousaurus   84.9 70.600     Coniacian     89.3     85.8
# 4: Achelousaurus   84.9 70.600     Santonian     85.8     83.5
# 5:  Acheroraptor   70.6 66.043     Campanian     83.5     70.6
# 6: Achelousaurus   84.9 70.600     Campanian     83.5     70.6
# 7:  Acheroraptor   70.6 66.043 Maastrichtian     70.6     66.5
# 8: Achelousaurus   84.9 70.600 Maastrichtian     70.6     66.5

这大多是正确的,但请看第3行。这似乎断言,从85.8到8930万年前的Cenomanian阶段与Achelousaurus重叠,后者生活在70.6到8490万年前。我错过了什么?

1 个答案:

答案 0 :(得分:2)

在1.9.5上,我明白了:

#            GENUS ma_max ma_min         Stage i.ma_max i.ma_min
# 1:   Abydosaurus  109.0 94.300        Albian    112.0     99.6
# 2:   Abydosaurus  109.0 94.300    Cenomanian     99.6     93.5
# 3: Achelousaurus   84.9 70.600     Santonian     85.8     83.5
# 4:  Acheroraptor   70.6 66.043     Campanian     83.5     70.6
# 5: Achelousaurus   84.9 70.600     Campanian     83.5     70.6
# 6:  Acheroraptor   70.6 66.043 Maastrichtian     70.6     66.5
# 7: Achelousaurus   84.9 70.600 Maastrichtian     70.6     66.5

很可能在this commit中的1.9.5中修复了浮点错误。如果您也可以验证这一点,那就太好了。