获得R范围内的出现次数

时间:2015-08-31 10:13:29

标签: r count range

当查找变量2(V2)和3(V3)的开始时间时,我查找了有关如何在列中查找大量计数的所有可能问题和答案距离变量1(V1)的开始时间+ 25秒和-25秒。

E.g。

var start

V1  268.523
V1  296.986
V1  306.701
V1  311.586
V1  342.755
V1  358.539
V2  337.968
V2  339.808
V2  340.948
V2  357.278
V2  358.718
V3  297.936
V3  300.156
V3  307.734
V3  311.378
V3  339.046

E.g。如果1st(V1)从268.525秒开始,则+ 25sec和-25sec的范围是从293.523到243.523。如果V2V3的开头符合此“时间窗口”,则应将其计为1。

我很感激,如果有人能给我一个如何获取信息的提示,我来自数据集。

1 个答案:

答案 0 :(得分:4)

在我看来,你想要的东西如下。我已将您的数据分成两个数据集。 df1其中var == "V1"df2其中var != "V1"。然后我在df1内设置+ -25范围以匹配并添加行索引,以便知道之后与df2匹配的行。然后我通过匹配列键入了两个数据集并运行foverlaps以查找重叠范围。最后,您可以通过index和匹配的变量名称以及dcast数据进行汇总,然后再加入

您可能需要GH(v 1.9.5+)的开发版本,请参阅here

library(data.table) # v 1.9.5+
df1 <- setDT(df)[var == "V1"]
df2 <- df[var != "V1"]

df1[, `:=`(from = start - 25L, to = start + 25L, indx = .I)]
setkey(df1, from, to)

df2[, end := start]
setkey(df2, start, end)

res <- foverlaps(df2, df1)[, .(start = toString(i.start), .N), by = .(indx, i.var)]
res <- dcast(res, indx ~ i.var, value.var = c("N", "start"))

setkey(df1, indx)
setkey(res, indx)[df1]

#    indx N_V2 N_V3                                    start_V2                           start_V3 var   start    from      to
# 1:    1   NA   NA                                          NA                                 NA  V1 268.523 243.523 293.523
# 2:    2   NA    4                                          NA 297.936, 300.156, 307.734, 311.378  V1 296.986 271.986 321.986
# 3:    3   NA    4                                          NA 297.936, 300.156, 307.734, 311.378  V1 306.701 281.701 331.701
# 4:    4   NA    4                                          NA 297.936, 300.156, 307.734, 311.378  V1 311.586 286.586 336.586
# 5:    5    5    1 337.968, 339.808, 340.948, 357.278, 358.718                            339.046  V1 342.755 317.755 367.755
# 6:    6    5    1 337.968, 339.808, 340.948, 357.278, 358.718                            339.046  V1 358.539 333.539 383.539

数据

df <- structure(list(var = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("V1", "V2", "V3"
), class = "factor"), start = c(268.523, 296.986, 306.701, 311.586, 
342.755, 358.539, 337.968, 339.808, 340.948, 357.278, 358.718, 
297.936, 300.156, 307.734, 311.378, 339.046)), .Names = c("var", 
"start"), class = "data.frame", row.names = c(NA, -16L))