嗨,我正在尝试建立一个函数,以查找时间序列在一定的时间步长下从较低阈值到较高阈值的事件,但是我觉得应该有一个更优雅的解决方案,我我不是100%确信我已抓到所有案件
示例数据
df <- data.frame(DateTime = seq.POSIXt(from = as.POSIXct("2019-01-01"),to = as.POSIXct("2019-01-02"), by ="hour"),
Value = c(1,9,150,9,150,120,110,50,60,50,50,5,5,7,5,110,110,40,110,2,8,120,5,130,120))
这是基本逻辑。对于minThresh和maxThresh(假设分别为10和100)和窗口大小(此处为
slide
中的4),我想说的是,如果满足以下所有条件,则此点为峰值(输出= 1):
这是我到目前为止所拥有的
library(dplyr)
library(tsibble)
myfun <- function(dat, minThresh=10, maxThresh=100){
thisVal <- dat[length(dat)]
#Check this value > max threshold
if(!thisVal > maxThresh) return(0)
#Check there is a value less than min threshold
belowThreshol <- which(dat<minThresh)
if(length(belowThreshol)==0) return(0)
#reset values after going above max and below min (so first peak doesn't stop 2nd peak counting)
# eg for case (dat = c(1,500,2,500)) resets at 2
aboveThreshol <- (dat>maxThresh)
aboveThreshol[1:max(belowThreshol)] <- FALSE
#check that thisValue is the first (after reset) > maxThresh
if(min(which(aboveThreshol)) < length(dat)) return(0)
return(1)
}
df %>% mutate(test = slide_dbl(Value, myfun, .size = 4))
如果可能的话,我希望有一个tidyverse解决方案
答案 0 :(得分:0)
slide = 4
minThresh = 10
maxThresh = 100
我的版本使用rollapply
中的zoo
myfun <- function(x) {
min_ind <- which(x < minThresh)
if ((x[length(x)] > maxThresh) & (length(min_ind) > 0)) #condition 1 & condition 2
if(sum(x[max(min_ind):length(x)] > maxThresh) == 1) #condition 3
return(1)
return(0)
}
,现在使用大小为slide
的滑动窗口应用此功能
library(zoo)
library(dplyr)
df %>%
mutate(test = lag(rollapply(Value,slide,myfun,fill = NA, align = "left"),slide-1))
# DateTime Value test
#1 2019-01-01 00:00:00 1 NA
#2 2019-01-01 01:00:00 9 NA
#3 2019-01-01 02:00:00 150 NA
#4 2019-01-01 03:00:00 9 0
#5 2019-01-01 04:00:00 150 1
#6 2019-01-01 05:00:00 120 0
#7 2019-01-01 06:00:00 110 0
#8 2019-01-01 07:00:00 50 0
#9 2019-01-01 08:00:00 60 0
#10 2019-01-01 09:00:00 50 0
#11 2019-01-01 10:00:00 50 0
#12 2019-01-01 11:00:00 5 0
#13 2019-01-01 12:00:00 5 0
#14 2019-01-01 13:00:00 7 0
#15 2019-01-01 14:00:00 5 0
#16 2019-01-01 15:00:00 110 1
#17 2019-01-01 16:00:00 110 0
#18 2019-01-01 17:00:00 40 0
#19 2019-01-01 18:00:00 110 0
#20 2019-01-01 19:00:00 2 0
#21 2019-01-01 20:00:00 8 0
#22 2019-01-01 21:00:00 120 1
#23 2019-01-01 22:00:00 5 0
#24 2019-01-01 23:00:00 130 1
#25 2019-01-02 00:00:00 120 0
我们在这里使用lag
是因为当输入为c(5, 7, 5, 110)
时,输出将返回1,但是将其分配给序列中的前5个,而对于110,我们需要将输出为1我们需要相对于窗口大小移动1。