基于两列的分组数据的周期

时间:2013-04-11 13:49:01

标签: frames

我有一个名为example.csv的文件,其中包含以下数据:

day,number,price,pr
2010-01-01 00:01:00,1,0.4,2
2010-01-01 00:02:00,1,1.2,4
2010-01-01 00:03:00,1,2.5,6
2010-01-01 00:04:00,1,9.1,2
2010-01-01 00:05:00,2,3.4,7
2010-01-01 00:06:00,2,6.9,9
2010-01-01 00:07:00,2,8.9,2
2010-01-01 00:08:00,3,9.1,5
2010-01-01 00:09:00,3,4.2,9
2010-01-01 00:10:00,3,11.2,2
2010-01-01 00:11:00,4,53.12,4
2010-01-01 00:12:00,4,45.21,1
2010-01-01 00:12:00,4,1.1,5
2010-01-01 00:13:00,4,3.43,2
2010-01-01 00:14:00,4,21.42,4

加载数据:

 example = read.csv(file="path/example.csv", header=TRUE, sep=",")

基于日

ddx <- xts(x = example[, c("number", "price", "pr" )], order.by = as.POSIXct(example[, "day"], tz = "GMT", format = "%Y-%m-%d %H:%M:%S"))

应用它,它给出的输出是列日和价格:

period.apply(ddx$number, endpoints(ddx, on = "minutes", k = 3), sum)

1 个答案:

答案 0 :(得分:1)

您创建xts的方法相当复杂。试试以下。

txt <- 'day,number,price
 2010-01-01 00:01:00,1,0.4
 2010-01-01 00:02:00,1,1.2
 2010-01-01 00:03:00,1,2.5
 2010-01-01 00:04:00,2,9.1
 2010-01-01 00:05:00,2,3.4
 2010-01-01 00:06:00,2,6.9
 2010-01-01 00:07:00,3,8.9
 2010-01-01 00:08:00,3,9.1
 2010-01-01 00:09:00,3,4.2
 2010-01-01 00:10:00,4,11.2
 2010-01-01 00:11:00,4,53.12
 2010-01-01 00:12:00,4,45.21
 2010-01-01 00:12:00,4,1.1
 2010-01-01 00:13:00,4,3.43
 2010-01-01 00:14:00,4,21.42'

DD <- read.csv(text = txt, stringsAsFactor = FALSE)
# DD is already a dataframe
DD
##                    day number price
## 1  2010-01-01 00:01:00      1  0.40
## 2  2010-01-01 00:02:00      1  1.20
## 3  2010-01-01 00:03:00      1  2.50
## 4  2010-01-01 00:04:00      2  9.10
## 5  2010-01-01 00:05:00      2  3.40
## 6  2010-01-01 00:06:00      2  6.90
## 7  2010-01-01 00:07:00      3  8.90
## 8  2010-01-01 00:08:00      3  9.10
## 9  2010-01-01 00:09:00      3  4.20
## 10 2010-01-01 00:10:00      4 11.20
## 11 2010-01-01 00:11:00      4 53.12
## 12 2010-01-01 00:12:00      4 45.21
## 13 2010-01-01 00:12:00      4  1.10
## 14 2010-01-01 00:13:00      4  3.43
## 15 2010-01-01 00:14:00      4 21.42

ddx <- xts(x = DD[, c("number", "price")], order.by = as.POSIXct(DD[, "day"], tz = "GMT", format = "%Y-%m-%d %H:%M:%S"))
ddx
##                     number price
## 2010-01-01 00:01:00      1  0.40
## 2010-01-01 00:02:00      1  1.20
## 2010-01-01 00:03:00      1  2.50
## 2010-01-01 00:04:00      2  9.10
## 2010-01-01 00:05:00      2  3.40
## 2010-01-01 00:06:00      2  6.90
## 2010-01-01 00:07:00      3  8.90
## 2010-01-01 00:08:00      3  9.10
## 2010-01-01 00:09:00      3  4.20
## 2010-01-01 00:10:00      4 11.20
## 2010-01-01 00:11:00      4 53.12
## 2010-01-01 00:12:00      4 45.21
## 2010-01-01 00:12:00      4  1.10
## 2010-01-01 00:13:00      4  3.43
## 2010-01-01 00:14:00      4 21.42

要在号码列上使用period.apply,只需指定ddx$number而不是ddx

period.apply(ddx$number, endpoints(ddx, on = "minutes", k = 3), sum)
##                     number
## 2010-01-01 00:02:00      2
## 2010-01-01 00:05:00      5
## 2010-01-01 00:08:00      8
## 2010-01-01 00:11:00     11
## 2010-01-01 00:14:00     16