在数据框架上使用拆分,POSIXlt作为标准

时间:2016-02-09 03:51:34

标签: r split dataframe posixlt

我试图根据时间分割和汇总一些数据。

这里有一些冗余信息,不应该干扰这篇文章。我想基于FiveMinBar分割数据,然后获得第一个开盘价,最高价,最低价,最后一个价。和最后的FiveMinBar。

            Date  Time  Open  High   Low Close            DateTime          FiveMinBar
10173 2000-01-03 09:31 70.00 70.00 69.88 70.00 2000-01-03 09:31:00 2000-01-03 09:35:00
10174 2000-01-03 09:32 70.00 70.00 69.50 70.00 2000-01-03 09:32:00 2000-01-03 09:35:00
10175 2000-01-03 09:33 69.94 70.00 69.50 70.00 2000-01-03 09:33:00 2000-01-03 09:35:00
10176 2000-01-03 09:34 70.00 70.00 69.38 70.00 2000-01-03 09:34:00 2000-01-03 09:35:00
10177 2000-01-03 09:35 70.00 70.00 69.50 69.81 2000-01-03 09:35:00 2000-01-03 09:35:00
10178 2000-01-03 09:36 69.81 69.88 68.75 68.75 2000-01-03 09:36:00 2000-01-03 09:40:00
10179 2000-01-03 09:37 68.75 69.06 68.75 68.75 2000-01-03 09:37:00 2000-01-03 09:40:00
10180 2000-01-03 09:38 68.81 69.06 68.56 68.63 2000-01-03 09:38:00 2000-01-03 09:40:00
10181 2000-01-03 09:39 68.56 69.00 68.50 68.56 2000-01-03 09:39:00 2000-01-03 09:40:00
10182 2000-01-03 09:40 68.56 69.00 68.13 68.13 2000-01-03 09:40:00 2000-01-03 09:40:00
10183 2000-01-03 09:41 68.63 68.63 67.75 67.88 2000-01-03 09:41:00 2000-01-03 09:45:00
10184 2000-01-03 09:42 68.00 68.06 67.25 67.38 2000-01-03 09:42:00 2000-01-03 09:45:00
10185 2000-01-03 09:43 67.38 67.38 67.00 67.19 2000-01-03 09:43:00 2000-01-03 09:45:00
10186 2000-01-03 09:44 67.13 67.25 66.75 66.81 2000-01-03 09:44:00 2000-01-03 09:45:00
10187 2000-01-03 09:45 66.88 67.25 66.00 66.31 2000-01-03 09:45:00 2000-01-03 09:45:00

我的第一次尝试是使用sapply与

进行此操作

split(data, data$FiveMinBar)

但是,split不适用于POSIXlt数据。我确实拿出了下面的解决方案,虽然它远远不是" R最佳"因为它创建了一个空的数据框,需要将FiveMinBar强制转换为数字,然后转换回POSIXlt,并使用for循环。

我的解决方案:

 results <- data.frame(Open=numeric(), High=numeric(), Low=numeric(), Close=numeric(),
                        DateTime=numeric())

  for (i in 1:length(unique(data$FiveMinBar))){
    temp <- data[data$FiveMinBar == unique(data$FiveMinBar)[i],]
    Open=temp$Open[1] 
    High=max(temp$High) 
    Low=min(temp$Low)
    Close=temp$Close[nrow(temp)]
    DateTime= as.numeric(temp$DateTime[nrow(temp)])
    results <- rbind(results, cbind(Open, High, Low, Close, DateTime))
  }

   results$DateTime <- as.POSIXlt(results$DateTime, origin="1970-01-01")

这给出了这个结果:

    Open  High   Low Close            DateTime
1  70.00 70.00 69.38 69.81 2000-01-03 09:35:00
2  69.81 69.88 68.13 68.13 2000-01-03 09:40:00
3  68.63 68.63 66.00 66.31 2000-01-03 09:45:00
4  66.25 66.50 65.63 65.81 2000-01-03 09:50:00
5  65.88 65.88 64.25 64.36 2000-01-03 09:55:00
6  64.31 64.38 63.25 63.44 2000-01-03 10:00:00
7  63.44 64.50 63.25 64.19 2000-01-03 10:05:00
8  64.25 64.63 63.75 64.44 2000-01-03 10:10:00
9  64.63 64.94 64.19 64.81 2000-01-03 10:15:00
10 64.88 65.25 64.56 65.13 2000-01-03 10:20:00

有更清洁的方法吗?我宁愿将数据保存为数据帧而不是转换为xts。

谢谢。

以下是重新创建初始数据框的代码:

data <- structure(list(Date = structure(c(10959, 10959, 10959, 10959, 
10959, 10959, 10959, 10959, 10959, 10959, 10959, 10959, 10959, 
10959, 10959), class = "Date"), Time = c("09:31", "09:32", "09:33", 
"09:34", "09:35", "09:36", "09:37", "09:38", "09:39", "09:40", 
"09:41", "09:42", "09:43", "09:44", "09:45"), Open = c(70, 70, 
69.94, 70, 70, 69.81, 68.75, 68.81, 68.56, 68.56, 68.63, 68, 
67.38, 67.13, 66.88), High = c(70, 70, 70, 70, 70, 69.88, 69.06, 
69.06, 69, 69, 68.63, 68.06, 67.38, 67.25, 67.25), Low = c(69.88, 
69.5, 69.5, 69.38, 69.5, 68.75, 68.75, 68.56, 68.5, 68.13, 67.75, 
67.25, 67, 66.75, 66), Close = c(70, 70, 70, 70, 69.81, 68.75, 
68.75, 68.63, 68.56, 68.13, 67.88, 67.38, 67.19, 66.81, 66.31
), DateTime = structure(list(sec = c(0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0), min = 31:45, hour = c(9L, 9L, 9L, 9L, 9L, 
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), mday = c(3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), mon = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), year = c(100L, 
100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 
100L, 100L, 100L), wday = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), yday = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), isdst = c(0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("sec", "min", 
"hour", "mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXlt", 
"POSIXt")), FiveMinBar = structure(list(sec = c(0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), min = c(35L, 35L, 35L, 35L, 35L, 
40L, 40L, 40L, 40L, 40L, 45L, 45L, 45L, 45L, 45L), hour = c(9L, 
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), mday = c(3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), mon = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), year = c(100L, 
100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 
100L, 100L, 100L), wday = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), yday = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), isdst = c(0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("sec", "min", 
"hour", "mday", "mon", "year", "wday", "yday", "isdst"), tzone = c("", 
"EST", "EDT"), class = c("POSIXlt", "POSIXt"))), .Names = c("Date", 
"Time", "Open", "High", "Low", "Close", "DateTime", "FiveMinBar"
), row.names = 10173:10187, class = "data.frame")

1 个答案:

答案 0 :(得分:1)

问题实际上是您在data.frame中有POSIXlt值。它们存储在一个列表中,因为data.frames是列表,所以列表中的列表并不易于使用。如果要在data.frame中存储日期/时间值,最好使用兄弟数据类型POSIXct。它存储为简单的向量而不是列表。

在上面的示例中,您可以使用

转换列
data$FiveMinBar <- as.POSIXct(data$FiveMinBar)

然后拆分应该没有问题

split(data, data$FiveMinBar)