如何在数据集中查找更改点

时间:2017-09-04 01:03:06

标签: r

我需要找到增加或减少趋势开始和结束的点。在该数据中,连续值之间的差值~10被认为是噪声(即,不是增加或减少)。根据下面给出的样本数据,第一个增加趋势将从317开始,结束于432,另一个将从441开始,到983结束。这些点中的每一个都将记录在一个单独的向量中。

sample<- c(312,317,380,432,438,441,509,641,779,919,
           983,980,978,983,986,885,767,758,755)

以下是主要变化点的图像。有人可以建议一个R方法吗?

enter image description here

1 个答案:

答案 0 :(得分:1)

以下是如何制作变更点矢量:

vec <- c(100312,100317,100380,100432,100438,100441,100509,100641,100779,100919,
         100983,100980,100978,100983,100986,100885,100767,100758,100755,100755)

#this finds your trend start/stops
idx <- c(cumsum(rle(abs(diff(vec))>10)$lengths)+1)

#create new vector of change points:
newVec <- vec[idx]
print(newVec)
[1] 100317 100432 100441 100983 100986 100767 100755

#(opt.) to ignore the first and last observation as a change point:
idx <- idx[which(idx!=1 & idx!=length(vec))]

#update new vector if you want the "opt." restrictions applied:
newVec <- vec[idx]
print(newVec)
[1] 100317 100432 100441 100983 100986 100767

#you can split newVec by start/stop change points like this:
start_changepoints <- newVec[c(TRUE,FALSE)]
print(start_changepoints)
[1] 100317 100441 100986

end_changepoints <- newVec[c(FALSE,TRUE)]
print(end_changepoints)
[1] 100432 100983 100767

#to count the number of events, just measure the length of start_changepoints:
length(start_changepoints)
[1] 3

如果您想绘制它,可以使用:

require(ggplot2)

#preps data for plot
df <- data.frame(vec,trends=NA,cols=NA)
df$trends[idx] <- idx
df$cols[idx] <- c("green","red")

#plot
ggplot(df, aes(x=1:NROW(df),y=vec)) +
  geom_line() +
  geom_point() +
  geom_vline(aes(xintercept=trends, col=cols), 
             lty=2, lwd=1) +
  scale_color_manual(values=na.omit(df$cols),
                     breaks=na.omit(unique(df$cols)),
                     labels=c("Start","End")) +
  xlab("Index") +
  ylab("Value") +
  guides(col=guide_legend("Trend State"))

输出:

enter image description here