我试图在两种模式的出现之间提取数据。即如果模式发生所有数据的子集,直到该模式再次发生。然后我需要给这个子集一个数字,以便它可以识别
USING(R)
示例数据:
DF<-(structure(list(date.time = structure(c(1374910680, 1374911040,
1374911160, 1374911580, 1374913380, 1374913500, 1374913620, 1374913740,
1374914160, 1374914400, 1374914520, 1374914940, 1374915000, 1374915120,
1374915240), class = c("POSIXct", "POSIXt"), tzone = ""), aerial = structure(c(2L,
2L, 8L, 8L, 2L, 2L, 2L, 8L, 8L, 8L, 2L, 2L, 8L, 2L, 2L), .Label = c("0",
"1", "10", "11", "2", "3", "4", "5", "6", "7", "8", "9", "m"), class = "factor")), .Names = c("date.time",
"aerial"), row.names = c(1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
14L, 15L), class = "data.frame") )
示例模式:DF $航空重复1,1
从上面我想在模式的出现之间对数据进行子集化/提取,然后给出一个可识别的数字,表示该模式的出现次数(即这是第一次出现,这是第二次出现等)等)
期望的输出:
date.time aerial occurrence
3 2013-07-27 08:46:00 5 1
4 2013-07-27 08:53:00 5 1
8 2013-07-27 09:29:00 5 2
9 2013-07-27 09:36:00 5 2
10 2013-07-27 09:40:00 5 2
13 2013-07-27 09:50:00 5 3
我可以识别模式:
require(zoo)
library(zoo)
pat <- c(1,1)
x <- rollapply(DF$aerial, length(pat), FUN=function(x) all(x == pat))
DF[which(x),]
显然我可以创建一个is.between函数
is.between <- function(x, a, b) {
x > a & x < b
}
然而在此之后我被卡住了,
注意:模式之间的数据可能并不总是天线5,这用于简化示例
帮助和指示非常感谢!
答案 0 :(得分:2)
似乎排除所有至少2长的1的运行是好的,所以试试这个:
library(zoo)
a <- as.numeric(as.character(DF$aerial))
r <- rle(a)
cond <- with(r, values != 1 | lengths < 2)
ok <- rep(cond, r$lengths)
occur <- rep(cumsum(cond), r$lengths)
cbind(DF, occur)[ok, ]
给出:
date.time aerial occur
3 2013-07-27 03:46:00 5 1
4 2013-07-27 03:53:00 5 1
8 2013-07-27 04:29:00 5 2
9 2013-07-27 04:36:00 5 2
10 2013-07-27 04:40:00 5 2
13 2013-07-27 04:50:00 5 3
修订:添加了occur
列