在设定的时间段内记录零时删除数据帧的部分

时间:2013-02-10 18:41:47

标签: r

我有一个简单的数据框。

a <- c("06/12/2012 06:00","06/12/2012 06:05","06/12/2012 06:10","06/12/2012 06:15","06/12/2012 06:20","06/12/2012 06:25",
   "06/12/2012 06:30","06/12/2012 06:35","06/12/2012 06:40","06/12/2012 06:45","06/12/2012 06:50","06/12/2012 06:55",
   "06/12/2012 07:00","06/12/2012 07:05","06/12/2012 07:10","06/12/2012 07:15","06/12/2012 07:20","06/12/2012 07:25",
   "06/12/2012 07:30","06/12/2012 07:35","06/12/2012 07:40","06/12/2012 07:45","06/12/2012 07:50","06/12/2012 07:55",
   "06/12/2012 08:00")
a <- strptime(a, "%d/%m/%Y %H:%M")

b <-c("1","0","0","0","2","0","0","0","3","0","0","0","0","0","1","2","5","6","0","0","0","0","6","10","2")
df1 <- data.frame(a,b)

当有效数据不足时,我想使用R删除部分数据帧。每5分钟记录一次数据。如果在'b'列中仅记录零时有20分钟或更长时间的连续数据,则可以从我的最终数据帧中删除这些数据。

如果有人有任何想法可以帮助我,我会非常感激。

2 个答案:

答案 0 :(得分:3)

另一个,仍在使用rle

is.zero <- df1$b == 0
is.zero.rle <- rle(is.zero)
df1[rep(is.zero.rle$lengths, is.zero.rle$lengths) * is.zero < 4, ]

如果我显示中间结果,可能会有所帮助:

rep(is.zero.rle$lengths, is.zero.rle$lengths) * is.zero
# [1] 0 3 3 3 0 3 3 3 0 5 5 5 5 5 0 0 0 0 4 4 4 4 0 0 0

答案 1 :(得分:2)

使用rle的一种解决方案(正如Ben在评论中提到的那样)

# get rle
t <- rle(as.numeric(as.character(df1$b)))
# check for condition. NOTE: here I assume all are 5 minute intervals!!
# So, if rle length >= 4, then its >= 20 minute interval
p <- which(t$values == 0 & t$lengths >= 4)
w <- cumsum(t$lengths)
o <- unlist(lapply(p, function(x) {
    c((w[x-1]+1):w[x])
}))
df1[-o, ]

#                      a  b
# 1  2012-12-06 06:00:00  1
# 2  2012-12-06 06:05:00  0
# 3  2012-12-06 06:10:00  0
# 4  2012-12-06 06:15:00  0
# 5  2012-12-06 06:20:00  2
# 6  2012-12-06 06:25:00  0
# 7  2012-12-06 06:30:00  0
# 8  2012-12-06 06:35:00  0
# 9  2012-12-06 06:40:00  3
# 15 2012-12-06 07:10:00  1
# 16 2012-12-06 07:15:00  2
# 17 2012-12-06 07:20:00  5
# 18 2012-12-06 07:25:00  6
# 23 2012-12-06 07:50:00  6
# 24 2012-12-06 07:55:00 10
# 25 2012-12-06 08:00:00  2