R xts对象 - 子集数据点连续5秒

时间:2015-04-20 11:47:42

标签: r subset xts seconds

我有一个很大的xts对象,想要在时间列中对秒进行子集化,但是只有在连续最少5个连续秒的序列中。我每秒最多有8个数据点(在同一秒内测量它们不应算作5个连续点)。

And_sub_xts是我的xts对象

> str(And_sub_xts)
An ‘xts’ object on 2010-04-09 20:32:56/2010-04-26 06:56:57 containing:
 Data: chr [1:164421, 1:11] "0.255416" "0.168836" "0.212126" "0.229442" "0.238100" "0.212126" "0.168836" ...
- attr(*, "dimnames")=List of 2
 ..$ : NULL
 ..$ : chr [1:11] "CalSurge" "CalSway" "CalHeave" "Stat_Surge" ...
 Indexed by objects of class: [POSIXct,POSIXt] TZ: 
 xts Attributes:  
NULL

的前100个值

abs(diff(.indexsec(And_sub_xts))

56 8 23 34 40 40 41 42 25 27 34 35 38 38 40 40 41 56 59 59 19 19 20 20 20 20 22 22 23 23 24 24 24 25 25 26 27 27 27 27 27 28 28 < / strong> 30 30 30 37 38 40 40 41 44 44 46 46 47 48 51 52 54 54 54 54 55 56 59 1 4 4 4 6 6 6 6 7 7 11 12 12 14 14 15 16 16 17 18 18 19 19 21 21 22 22 23 23 25 25 26 26 26

我用粗体标记了保留,因此子集应该只包含这些数据点。

我只是意识到理论上可能会发生一些像这样分发的数据点

2010-04-09 20:32:20
2010-04-09 20:32:20
2010-04-09 20:32:21
2010-04-09 20:32:22
2010-04-09 20:32:22
2010-04-09 20:40:22
2010-04-09 22:52:23
2010-04-10 20:52:24

这不会连续5秒,但你不能用.indexsec命令来解释这个问题 - 也许任何人都知道如何解决这个问题。

感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

这是一种方法。 x是包含索引值的示例数据,其中秒数等于前100个值。

require(xts)
# sample data
s <- c(56, 8, 23, 34, 40, 40, 41, 42, 25, 27, 34, 35, 38, 38, 40, 
40, 41, 56, 59, 59, 19, 19, 20, 20, 20, 20, 22, 22, 23, 23, 24, 
24, 24, 25, 25, 26, 27, 27, 27, 27, 27, 28, 28, 30, 30, 30, 37, 
38, 40, 40, 41, 44, 44, 46, 46, 47, 48, 51, 52, 54, 54, 54, 54, 
55, 56, 59, 1, 4, 4, 4, 6, 6, 6, 6, 7, 7, 11, 12, 12, 14, 14, 
15, 16, 16, 17, 18, 18, 19, 19, 21, 21, 22, 22, 23, 23, 25, 25, 
26, 26, 26)
S <- cumsum(ifelse(c(0, diff(s)) < 0, 1, 0)) * 60 + s
x <- .xts(seq_along(S), S, tzone="UTC")

基本思路是将数据汇总到1秒分辨率,这样您就可以使用rle(游程编码)来查找连续的5秒观测值。然后在聚合数据中查找5秒观测值集的第一个和最后一个时间戳,然后在原始数据中查找这些时间戳的位置。最后,使用原始数据中时间戳的位置来创建可用于对连续的5秒观察组进行子集化的序列集。

# aggregate data to 1-second resolution
oneSec <- period.apply(x, endpoints(x, 'seconds'), identity) 
# find the runs of 5 or more consecutive one-second increments
consec <- rle(diff(.index(oneSec)))
gte5s <- consec$lengths >= 5
# get the location of the first obs of the run in the 1-second data
begLoc <- cumsum(c(1,consec$lengths))[gte5s]
endLoc <- begLoc + consec$lengths[gte5s]
# get the timestamp of the first and last obs from the original data
beg <- lapply(index(oneSec)[begLoc], function(i) first(x[i, which.i=TRUE]))
end <- lapply(index(oneSec)[endLoc], function(i) last(x[i, which.i=TRUE]))
# create index vector between each value in 'beg' and 'end'
loc <- unlist(mapply(seq, beg, end))
# subset original object using index vector
X <- x[loc,]