在向量中找到最长的重复元素

时间:2015-05-14 01:04:43

标签: r vector

我想在下面的向量中找到等于“1”的连续重复元素的起始和结束索引。向量的值可以等于“1”或NA。

例如:

out2
 [1] "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1"
[21] "1" NA  NA  NA  NA  NA  "1" "1" "1" "1" "1" "1" NA  NA  NA  NA  NA  NA  NA  NA

输出应如下所示

    [,1] [,2]
[1,]  1   21
[2,]  27  32

3 个答案:

答案 0 :(得分:5)

尝试rle

x <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, NA, NA, NA, NA, NA, 1, 1, 1, 1, 1, 1, NA, NA, NA, NA, NA, 
NA, NA, NA)

with(rle(x), {
  ok <- !is.na(values)
  ends <- cumsum(lengths)[ok]
  starts <- ends - lengths[ok] + 1
  cbind(starts, ends)
})

,并提供:

     starts ends
[1,]      1   21
[2,]     27   32

答案 1 :(得分:2)

您可以使用rle来获取行程长度编码,这会产生这些类型的连续元素&#34;分析相对简单:

r <- rle(out2)
cs <- cumsum(r$lengths)
na.omit(cbind(cs[r$values == "1"] - r$length[r$values == "1"] + 1, cs[r$values == "1"]))
#      [,1] [,2]
# [1,]    1   21
# [2,]   27   32

rle并不喜欢NA个值(每个值都被编码为长度为1的游戏),因此如果你有,你可以放弃na.omit语法,例如,1和2而不是1和NA:

out2 <- rep(c(1,2,1,2),c(21,5,6,8))
r <- rle(out2)
cs <- cumsum(r$lengths)
cbind(cs[r$values == 1] - r$length[r$values == 1] + 1, cs[r$values == 1])
#      [,1] [,2]
# [1,]    1   21
# [2,]   27   32

答案 2 :(得分:0)

split上使用out2 <- rep(c(1,NA,1,NA),c(21,5,6,8))

spl <- split(seq_along(out2)[out2==1],cumsum(is.na(out2))[out2==1])
sapply(spl, function(x) c(x[1],tail(x,1)))
#      0  5
#[1,]  1 27
#[2,] 21 32

rle替代方案:

r <- rle(is.na(out2))
cbind(c(1,head(cumsum(r$l)[r$v],-1)+1),cumsum(r$l)[!r$v])
#     [,1] [,2]
#[1,]    1   21
#[2,]   27   32