提供如下矢量,
vec01 <- c(1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 1, 2, 1,
2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 2, 2,
1, 2, 2, 1, 1, 2, 3, 4, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 1, 1, 1,
2, 1, 2, 3, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3,
1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 1)
问题1:如何删除下面突出显示的异常:
vec01 <- c(1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 1, 2, 1,
2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, *2*, *2*,
1, 2, *2*, 1, 1, 2, 3, 4, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 1, 1, 1,
2, 1, 2, 3, 4, *2*, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3,
1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 1)
异常的定义:元素必须是1,2系列的一部分,......上面用粗体标记的元素
问题2:如何在删除异常后识别系列组,每个序列属于一个组,即输出如
result <- structure(list(vec = c(1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
1L, 1L, 1L, 2L, 1L, 2L),
group = c(1L, 1L, 2L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 10L, 10L, 11L,
11L, 12L, 13L, 14L, 15L, 15L, 16L, 16L)),
.Names = c("vec", "group"),
row.names = c(NA, 30L), class = "data.frame")
答案 0 :(得分:7)
这是针对问题2(如果你最后删除所有TRUE则问题1)
library(data.table) #load data.table because syntax is nice (matter of pers taste)
DT = data.table(vec01)
DT[,counter:=ifelse(vec01==1,1,0)] #identify each sequence starting with one
DT[,counter:=cumsum(counter)] #trick to give a diff ID to each seq so we can use by
DT[,flag:=is.unsorted(vec01),by=counter] #check sorting for each sequence
编辑:使用is.unsorted
f(vec01)
替换为f = function(x){!(x==Reduce(max,x,accumulate=T))}
答案 1 :(得分:2)
清理序列(问题1):
m <- vec01[1]==1
for (i in seq(2,length(vec01)))
m[i] <- vec01[i]==1 || vec01[i]==vec01[i-1]+1 && m[i-1]
vec01 <- vec01[m]
现在建立你想要的结构(感谢cumsum()
想法的@statquant):
data.frame(vec=vec01, group=cumsum(c(1,diff(vec01)!=1)))
答案 2 :(得分:2)
有趣的问题,这是另一种解决方案。它查看值递增的位置并构建相应的理想(无异常)序列vec02
。然后,只需比较vec01
和vec02
。
is.incr <- c(FALSE, diff(vec01) == 1)
vec02 <- rep(1, length(vec01)) + sequence(rle(is.incr)$lengths) * is.incr
vec <- vec01[vec01 == vec02]
result <- data.frame(vec = vec, group = cumsum(vec == 1))