我有一个带有两个值“ S2”和“ S1”的B_S列。 S2标记每个组的边界和每个组的原始行。 S1标记需要根据“高”列进行检查的可能项目。
例如,我需要检查S1 High是否大于S2 High。我需要循环浏览所有S1条目,并选择S1 High高于S2 High的行,并删除S1 High不高于S2的行。
我对data.table没有足够的经验来产生这样的结果。
以下是数据示例:
structure(list(Time = c("16/10/2014 09:19", "16/10/2014 09:20",
"16/10/2014 09:21", "16/10/2014 09:22", "17/12/2014 12:59", "17/12/2014 13:00",
"17/12/2014 13:01", "17/12/2014 13:02"), High = c(1833.5, 1832.5,
1820.5, 1852.5, 1992, 1991.25, 2001.25, 2002.25), rn = c(77470L,
77469L, 77468L, 77467L, 17758L, 17757L, 17756L, 17755L), B_S = c("S2",
"S1", "S1", "S1", "S2", "S1", "S1", "S1")), row.names = c(NA,
-8L), class = c("data.table", "data.frame")
预期结果: 对于第一组(第1-4行),将保留第1行和第4行。
structure(list(Time = c("16/10/2014 09:19", "16/10/2014 09:22"
), High = c(1833.5, 1852.5), rn = c(77470L, 77467L), B_S = c("S2",
"S1")), class = c("data.table", "data.frame"), row.names = c(NA,
-2L)
对于第二组(第5-8行),将保留第5行和第7行。
structure(list(Time = c("17/12/2014 12:59", "17/12/2014 13:01"
), High = c(1992, 2001.25), rn = c(17758L, 17756L), B_S = c("S2",
"S1")), class = c("data.table", "data.frame"), row.names = c(NA,
-2L)
答案 0 :(得分:2)
一个选项将按逻辑条件的累积总和分组,其中“ B_S”为“ S2”,然后使“高”的索引大于或等于“ first
”的值“高”,选择前两个位置,提取行索引(.I
)并基于该行对子集进行
i1 <- df1[, .I[which((High >= first(High)))[1:2]], .(grp = cumsum(B_S == "S2"))]$V1
df1[i1]
# Time High rn B_S
#1: 16/10/2014 09:19 1833.50 77470 S2
#2: 16/10/2014 09:22 1852.50 77467 S1
#3: 17/12/2014 12:59 1992.00 17758 S2
#4: 17/12/2014 13:01 2001.25 17756 S1