我正在努力寻找解决以下问题的最有效方法。假设我们有一些看起来像这样的数据:
d1 <- seq(0, 3000, length.out = 1000)
d2 <- cos(seq(0, 6*pi, length.out = 1000))*rev(seq(0, 1, length.out = 1000))
dd <- as.data.frame(cbind(d1, d2))
我需要从d2
检测连续增加的数字的长度为20的第一个序列的第一个元素。在上图中,它将在d1 = 500
附近。我目前的方法是基于这个功能:
getFirstBeforeSequence <- function(x, y, len){
a1 <- cbind(lapply(split(y, cumsum(c(1, diff(y) < 0))), length))
a2 <- which(a1 > len)[1]-1
a3 <- sum(unlist(a1)[1:a2])+1
a3
}
此函数为我提供了所需的输出,元素位于位置164并且在d1 = 489.4895
时出现:
getFirstBeforeSequence(dd$d1, dd$d2, 20)
# 164
dd$d1[164]
# 489.4895
但是,我的印象是我的解决方案过于复杂,而且我很确定其他人会有更好的解决方案。任何帮助将非常感激。
答案 0 :(得分:2)
这是一个刺:
getFirstBefore<-function(x,len){
r<-rle(sign(diff(x)))
n<-which(r$lengths>=len & r$values==1)
if(length(n)==0)
return(-1)
1+sum(r$lengths[seq_len(n[1]-1)])
}
它比原版更有效,但仍有改进的余地:
microbenchmark(
getFirstBeforeSequence(dd$d1,dd$d2,20),
getFirstBefore(dd$d2,20))
# Unit: microseconds
# expr min lq median uq
# getFirstBeforeSequence(dd$d1, dd$d2, 20) 2433.174 2464.457 2486.186 2502.2005
# getFirstBefore(dd$d2, 20) 181.354 187.081 192.808 196.6805
# max neval
# 9932.534 100
# 239.700 100
答案 1 :(得分:1)
速度较慢,但提供了完全不同的方法:
firstOfSequence <- function(x, len){
v <- paste0(sign(diff(x))+1L, collapse="")
regexpr(paste0("([2])\\1{", len-1L, "}"), v)
}
> microbenchmark(
+ firstOfSequence(dd$d2, 20),
+ getFirstBefore(dd$d2, 20))
Unit: microseconds
expr min lq median uq max neval
firstOfSequence(dd$d1, 20) 978.181 981.3875 982.9910 998.7060 1111.597 100
getFirstBefore(dd$d1, 20) 191.147 196.5990 200.4475 205.0975 333.865 100
答案 2 :(得分:1)
y <- dd$d1
# indices of pits and peaks
pit <- which(diff(sign(diff(y))) == 2) + 1
peak <- which(diff(sign(diff(y))) == -2) + 1
# distance between peak and pit -> length of increase
len_incr <- peak - pit
# index of first pit from which a consecutive increase in 20 'steps' starts
idx <- pit[(len_incr > 20) == TRUE][1]
# corresponding x-value
dd$d2[idx]
# [1] 489.4895
# similar approach but let 'turnpoint' find pits and peaks.
library(pastecs)
tp <- turnpoints(y)
pit <- which(tp$pits == TRUE)
peak <- which(tp$peaks == TRUE)
len_incr <- peak - pit
idx <- pit[(len_incr > 20) == TRUE][1]
dd$d2[idx]
# [1] 489.4895