全部,
我正在寻找一种可靠的,无监督的方法来检测相对较短的矢量中的变化点。请考虑以下两个示例:
v1 = c(0.299584,0.314446,0.357783,0.388896,0.410417,0.427182,0.450383,0.466671,0.474884,0.474749,0.493566,0.500374,0.522482,0.529851,0.538387,0.577901,0.610939,0.639383,0.662433,0.692656,0.720543,0.738255,0.748055,0.7591,0.770595,0.781811,0.794479,0.794588,0.789448,0.77667,0.765406,0.75152,0.740408,0.726898,0.720766,0.709445,0.69896,0.687508,0.673382,0.65795,0.639214,0.620445,0.590047,0.561773,0.526807,0.486848,0.439681,0.387545,0.313369,0.282872,0.279908,0.271836,0.269088,0.262727,0.259782)
v2 = c(0.081309,0.206263,0.429069,0.511859,0.565194,0.578792,0.56919,0.51985,0.432563,0.193907,0.0771,0.086603,0.18303,0.177608,0.169706,0.260917,0.292062,0.2979,0.263249,0.270576,0.250422,0.25219,0.182878,0.080623,0.079443,0.088944,0.087623,0.126403,0.155563,0.273942,0.312054,0.370195,0.357087,0.336452,0.300574,0.243105,0.243105,0.25593,0.227401,0.218047,0.15857,0.157727,0.139801,0.125742,0.129142,0.142166,0.142166,0.136748,0.107755,0.064377,0.072801,0.060093,0.103441,0.111704,0.124544)
如果你看一下
plot(v1,type='l')
和
plot(v2,type='l')
你可以看到,对于v1,我想检测index = 28附近的变化,对于v2,我想检测索引值为8,11,18,25,32和51的变化。到目前为止,我已经尝试了贝叶斯变换点算法,该算法在识别拐点可能位置(低后验概率区域)方面运行良好,但仍然迫使我依靠目视检查进行最终确定:
install.packages('bcp')
library(bcp)
test = bcp(v1,w0=0.2,p0=0.01)
plot(v1,type='l')
par(new=TRUE)
plot(test$posterior.prob,type='l',col=2)
test = bcp(v2,w0=0.2,p0=0.01)
plot(v2,type='l')
par(new=TRUE)
plot(test$posterior.prob,type='l',col=2)
有没有办法在这种数据中自动选择多个变更点的无监督选择?也许我只是徒劳地寻找人类直觉的替代品:P我也看了变换点包,但它似乎并不是为这种数据而设计的。
谢谢, 亚伦
答案 0 :(得分:4)
所以,这是一个简单的解决方案。您可以修改参数以返回不同的(更多/更少,敏感/不敏感)拐点(或数据区域的区域)。
plot(v2, type="l", col="darkblue", lwd=2)
# v2 <- smooth(v2, kind="3") # optional
lines(v2, lwd=1, col="red")
d2 <- diff(v2)
d2 <- d2>0
d2 <- d2*2 -1
k <- 5
cutoff <- 10
scores <- sapply(k:(length(d2)-k), FUN=function(i){
score <- abs(mean(-d2[ i-1:k ], na.rm=T) + mean(d2[ i+0:k ], na.rm=T))
})
scores <- sapply(k:(length(v2)-k), FUN=function(i){
left <- (v2[sapply(i-1:k, max, 1) ]<v2[i])*2-1
right <- (v2[sapply(i+1:k, min, length(v2)) ]<v2[i])*2-1
score <- abs(sum(left) + sum(right))
})
inflections <- (k:(length(v2)-k))[scores>=cutoff]
plot(v2, type="l")
abline(v=inflections, col="red", lwd=3)
print(inflections) # 6 11 18 25 32 (missed 51, if you make cutoff=8 it'll catch it...)