Question

我在R中有一个向量：

data <- c(1,4,6,7,8,9,20,30,31,32,33,34,35,60)

我想要的是找到连续拉伸的开始和结束超过3个连续值。即：

start end
3  6  (stretch 6-9)
8 13 (stretch 30-35

我不知道如何到达那里。

Answer 1

从@eddi's answer到我的类似问题......

runs = split(seq_along(data), cumsum(c(0, diff(data) > 1)))
lapply(runs[lengths(runs) > 1], range)

# $`2`
# [1] 3 6
# 
# $`4`
# [1]  8 13

工作原理：

seq_along(data)是data的索引，来自1..length（data）
c(0, diff(data) > 1)在每个索引处都有一个data＆＃34;跳跃＆＃34;
cumsum(c(0, diff(data) > 1))是跳转之间连续运行的标识符

所以runs是将data的索引划分为data的值连续的运行。

Answer 2

因此，首先取a的GET并对其执行运行长度序列。然后，起点是2s之前的索引，而结束点是那些的负差异......很难解释，只需逐步执行代码并检查出来。这没有找到两个序列......如（1,3,4,7,9）中的（3,4）。我必须将diff部分包含在两个关闭的序列中......（1,3,5,7）。那些没有被正确抓住。任何如何，有趣的运动。我希望有人能做得更好。这有点乱......

remove

Answer 3

这是一个严重依赖?diff的基础R解决方案：

data <- c(1,4,6,7,8,9,20,30,31,32,33,34,35,60)

diff1 <- diff(data[1:(length(data)-1)]) # lag 1 difference
diff2 <- diff(data, 2) # lag 2 difference

# indices of starting consecutive stretches -- these will overlap
start_index <- which(diff1==1 & diff2==2)
end_index <- start_index + 2

# notice that these overlap:
data.frame(start_index, end_index)

# To remove overlap:
# We can remove *subsequent* consecutive start indices
#           and *initial* consecutive end indices

start_index_new <- start_index[which(c(0, diff(start_index))!=1)]
end_index_new <- end_index[which(c(diff(end_index), 0) != 1)]
data.frame(start_index_new, end_index_new)

#   start_index_new end_index_new
# 1               3             6
# 2               8            13

Cory的答案很棒 - 这个可能会更容易理解，因为你基本上是在检查从i位置，i+1位置的值是多1的情况。位置i + 2的值为2。您可以构建范围，然后使用另一个diff函数合并范围。我认为这有点简单。

您还可以使用zoo之类的软件包来帮助您解决差异。

如何在R中的向量中找到连续数字的范围

3 个答案: