在两个模式发生之间提取数据

时间:2014-02-14 11:24:57

标签: r pattern-matching between

我试图在两种模式的出现之间提取数据。即如果模式发生所有数据的子集,直到该模式再次发生。然后我需要给这个子集一个数字,以便它可以识别

USING(R)

示例数据:

DF<-(structure(list(date.time = structure(c(1374910680, 1374911040, 
                                   1374911160, 1374911580, 1374913380, 1374913500, 1374913620, 1374913740, 
                                   1374914160, 1374914400, 1374914520, 1374914940, 1374915000, 1374915120, 
                                   1374915240), class = c("POSIXct", "POSIXt"), tzone = ""), aerial = structure(c(2L, 
                                                                                                                  2L, 8L, 8L, 2L, 2L, 2L, 8L, 8L, 8L, 2L, 2L, 8L, 2L, 2L), .Label = c("0", 
                                                                                                                                                                                      "1", "10", "11", "2", "3", "4", "5", "6", "7", "8", "9", "m"), class = "factor")), .Names = c("date.time", 
                                                                                                                                                                                                                                                                                    "aerial"), row.names = c(1L, 2L, 3L, 4L, 5L, 
                                                                                                                                                                                                                                                                                                            6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 
                                                                                                                                                                                                                                                                                                             14L, 15L), class = "data.frame") )

示例模式:DF $航空重复1,1

从上面我想在模式的出现之间对数据进行子集化/提取,然后给出一个可识别的数字,表示该模式的出现次数(即这是第一次出现,这是第二次出现等)等)

期望的输出:

         date.time       aerial    occurrence
3  2013-07-27 08:46:00      5          1
4  2013-07-27 08:53:00      5          1
8  2013-07-27 09:29:00      5          2
9  2013-07-27 09:36:00      5          2
10 2013-07-27 09:40:00      5          2
13 2013-07-27 09:50:00      5          3

我可以识别模式:

require(zoo)
library(zoo)

pat <- c(1,1)

x <- rollapply(DF$aerial, length(pat), FUN=function(x) all(x == pat))

DF[which(x),]

显然我可以创建一个is.between函数

is.between <- function(x, a, b) {
x > a & x < b
}

然而在此之后我被卡住了,

注意:模式之间的数据可能并不总是天线5,这用于简化示例

帮助和指示非常感谢!

1 个答案:

答案 0 :(得分:2)

似乎排除所有至少2长的1的运行是好的,所以试试这个:

library(zoo)

a <- as.numeric(as.character(DF$aerial))
r <- rle(a)
cond <- with(r, values != 1 | lengths < 2)
ok <- rep(cond, r$lengths)
occur <- rep(cumsum(cond), r$lengths)
cbind(DF, occur)[ok, ]

给出:

             date.time aerial occur
3  2013-07-27 03:46:00      5     1
4  2013-07-27 03:53:00      5     1
8  2013-07-27 04:29:00      5     2
9  2013-07-27 04:36:00      5     2
10 2013-07-27 04:40:00      5     2
13 2013-07-27 04:50:00      5     3

修订:添加了occur