我正在尝试估算以下样本数据中每天两次观察A和B之间的平均秒数:
dput(tt2)
structure(c(1371.25, NA, 1373.95, NA, NA, 1373, NA, 1373.95,
1373.9, NA, NA, 1374, 1374.15, NA, 1374, 1373.85, 1372.55, 1374.05,
1374.15, 1374.75, NA, NA, 1375.9, 1374.05, NA, NA, NA, NA, NA,
NA, NA, 1375, NA, NA, NA, NA, NA, 1376.35, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1376.25, NA, 1378, 1376.5, NA, NA, NA, 1378,
1378, NA, NA, 1378.8, 231.9, 231.85, NA, 231.9, 231.85, 231.9,
231.8, 231.9, 232.6, 231.95, 232.35, 232, 232.1, 232.05, 232.05,
232.05, 231.5, 231.3, NA, NA, 231.1, 231.1, 231.1, 231, 231,
230.95, 230.6, 230.6, 230.7, 230.6, 231, NA, 231, 231, 231.45,
231.65, 231.4, 231.7, 231.3, 231.25, 231.25, 231.4, 231.4, 231.85,
231.75, 231.5, 231.55, 231.35, NA, 231.5, 231.5, NA, 231.5, 231.25,
231.15, 231, 231, 231, 231.05, NA), .Dim = c(60L, 2L), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "Asia/Calcutta", tzone = "Asia/Calcutta", index = structure(c(1459482300,
1459483766.38983, 1459485231.77966, 1459486697.16949, 1459488162.55932,
1459489627.94915, 1459491093.33898, 1459492558.72881, 1459494025.11864,
1459495490.50847, 1459496955.89831, 1459498421.28814, 1459499887.67797,
1459501353.0678, 1459502818.45763, 1459504283.84746, 1459505749.23729,
1459507214.62712, 1459508680.01695, 1459510145.40678, 1459511610.79661,
1459513076.18644, 1459514541.57627, 1459516007.9661, 1459517474.35593,
1459518939.74576, 1459520405.13559, 1459521870.52542, 1459523335.91525,
1459524804.30508, 1459526269.69492, 1459527735.08475, 1459529200.47458,
1459530667.86441, 1459532134.25424, 1459533600.64407, 1459535066.0339,
1459536531.42373, 1459537996.81356, 1459539702.20339, 1459541167.59322,
1459542634.98305, 1459544100.37288, 1459545565.76271, 1459547031.15254,
1459548496.54237, 1459549961.9322, 1459551429.32203, 1459552894.71186,
1459554360.10169, 1459555829.49153, 1459557294.88136, 1459558760.27119,
1459560225.66102, 1459561691.05085, 1459563160.44068, 1459564625.83051,
1459566091.22034, 1459567557.61017, 1459569028), tclass = c("POSIXct",
"POSIXt"), tzone = "Asia/Calcutta"), .Dimnames = list(NULL, c("A",
"B")), class = c("xts", "zoo"))
我可以通过两种方式实现:
1
fun.time=function(x) mean(diff(as.numeric(time(na.omit(x)))))
my.df.time<-do.call(rbind, lapply(split(tt2, "days"), FUN=function (x) {do.call(cbind, lapply(x, fun.time))}))
my.df.time
A B
[1,] 3029.006 1648.939
[2,] 5416.096 1632.957
2
df.time<-do.call(cbind, lapply(as.list(tt2), function(x) {
times <- time(na.omit(x))
aggregate(zoo(as.numeric(times), times), as.Date, function(x) mean(diff(x)))
}))
df.time
A B
2016-04-01 4152.630 1637.730
2016-04-02 3299.627 1675.446
请您建议 为什么这两种方法的A和B列值不同?
答案 0 :(得分:2)
不同之处在于as.Date
计算UTC的日期,而split(tt2, "days")
会将日期按当地时区(UTC-5.5,IIRC)午夜分割。
> tail(data.frame(tt2, utcDate=as.Date(index(tt2))), 12)
A B utcDate
2016-04-02 04:51:34 1376.25 NA 2016-04-01
2016-04-02 05:16:00 NA 231.50 2016-04-01
2016-04-02 05:40:29 1378.00 231.50 2016-04-02
2016-04-02 06:04:54 1376.50 NA 2016-04-02
2016-04-02 06:29:20 NA 231.50 2016-04-02
2016-04-02 06:53:45 NA 231.25 2016-04-02
2016-04-02 07:18:11 NA 231.15 2016-04-02
2016-04-02 07:42:40 1378.00 231.00 2016-04-02
2016-04-02 08:07:05 1378.00 231.00 2016-04-02
2016-04-02 08:31:31 NA 231.00 2016-04-02
2016-04-02 08:55:57 NA 231.05 2016-04-02
2016-04-02 09:20:28 1378.80 NA 2016-04-02
哪个是正确的取决于你想要什么。使用xts中的工具更简洁的方法是使用apply.daily
。
meanTimeDiff <- function(x) {
mean(diff(.index(na.omit(x))))
}
apply.daily(tt2, function(x) sapply(x, meanTimeDiff))
# A B
# 2016-04-01 23:54:26 3029.006 1648.939
# 2016-04-02 09:20:28 5416.096 1632.957