速度比较

Question

如何检查整数向量是否是“顺序”，即后续元素之间的差异恰好是1。我觉得我错过了像“is.sequential”这样的东西

这是我自己的功能：

is.sequential <- function(x){
    all(diff(x) == rep(1,length(x)-1))
}

Answer 1

不需要rep，因为1将被回收：编辑为允许5：2为真

is.sequential <- function(x){
  all(abs(diff(x)) == 1)
}

允许不同的序列

is.sequential <- function(x){
 all(diff(x) == diff(x)[1])
}

Answer 2

所以，@ Iselzer有一个很好的答案。但仍有一些极端情况：舍入错误和起始值。这是一个允许舍入错误的版本，但检查第一个值（几乎）是一个整数。

is.sequential <- function(x, eps=1e-8) {
  if (length(x) && isTRUE(abs(x[1] - floor(x[1])) < eps)) {
     all(abs(diff(x)-1) < eps)
  } else {
    FALSE
  }
}

is.sequential(2:5) # TRUE

is.sequential(5:2) # FALSE

# Handle rounding errors?
x <- ((1:10)^0.5)^2
is.sequential(x) # TRUE

# Does the sequence need to start on an integer?
x <- c(1.5, 2.5, 3.5, 4.5)
is.sequential(x) # FALSE

# Is an empty vector a sequence?
is.sequential(numeric(0)) # FALSE

# What about NAs?
is.sequential(c(NA, 1)) # FALSE

Answer 3

这个问题现在已经很老了，但是在某些情况下，知道向量是否是顺序的实际上很有用。

两个OP的答案都相当不错，但是正如Tommy所提到的那样，公认的答案存在一些缺陷。 “序列”是任何“等距间隔的数字序列”似乎是很自然的。这将包括负序列，起始值外部不同于0或1的序列，依此类推。

下面给出了一种非常多样化且安全的实现方式，

负值（-3至1）和负方向（3至1）
没有整数步（3.5、3.6、3.7 ...）的序列
错误的输入类型，例如无限值，NA和NAN值，数据帧等。

is.sequence <- function(x, ...)
    UseMethod("is.sequence", x)
is.sequence.default <- function(x, ...){
    FALSE
}
is.sequence.numeric <- function(x, tol = sqrt(.Machine$double.eps), ...){
    if(anyNA(x) || any(is.infinite(x)) || length(x) <= 1 || diff(x[1:2]) == 0)
        return(FALSE)
    diff(range(diff(x))) <= tol
}
is.sequence.integer <- function(x, ...){
    is.sequence.numeric(x, ...)
}
n <- 1236
#Test:
is.sequence(seq(-3, 5, length.out = n))
# TRUE
is.sequence(seq(5, -3, length.out = n))
# TRUE
is.sequence(seq(3.5, 2.5 + n, length.out = n))
# TRUE
is.sequence(LETTERS[1:7])

基本上，实现检查差异的最大值和最小值是否完全相等。

虽然使用S3类方法使实现稍微复杂一些，但它简化了错误输入类型的检查，并允许其他类的实现。例如，这可以很容易地将此方法扩展为说Date个对象，这将需要考虑是否只有工作日（或工作日）的序列也是一个序列。

速度比较

此实现非常安全，但是使用S4类会增加一些开销。对于小长度向量，好处是实现的多样性，而最坏情况下速度要慢15％左右。对于较大的向量，其速度稍快，如下面的微基准所示。

请注意，中位数时间比较合适，因为垃圾清理器可能会增加基准时间的不确定性。

ss <- seq(1, 1e6)
microbenchmark::microbenchmark(is.sequential(ss),
                               is.sequence(ss), #Integer calls numeric, adding a bit of overhead
                               is.sequence.numeric(ss))
# Unit: milliseconds
# expr                         min       lq     mean   median       uq      max neval
# is.sequential(ss)       19.47332 20.02534 21.58227 20.45541 21.23700 66.07200   100
# is.sequence(ss)         16.09662 16.65412 20.52511 17.05360 18.23958 61.23029   100
# is.sequence.numeric(ss) 16.00751 16.72907 19.08717 17.01962 17.66150 55.90792   100

检查R中的向量是否是连续的？

3 个答案:

速度比较