我有一个包含203614行和3列的大型数据集,其名称为" price"," Timestamp",energy。而时间戳有每个交易的回复
价格是数字
时间戳位于posixct
能量数字
dput(head(dataset))
structure(list(Price = c(18, 20, 23, 15, 15, 15), Timestamp.Transaction = structure(c(1388500200, 1388500200, 1388502000, 1388502000, 1388502000, 1388502000), class = c("POSIXct", "POSIXt"), tzone = ""), Energy = c(414, 230, 3, 3, 3, 3)), .Names = c("Price", "Timestamp.Transaction", "Energy"), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
我必须通过应用循环来执行以下步骤
1)我必须使用" timestamp"对数据集进行子集化。与某些时间戳相差1.33天
2)计算子集中价格的最小值,最大值,平均值并将其分配给新数据帧
3)我必须每隔15分钟迭代上述步骤
注意:m1是我的数据集
t1是时间戳向量,因为它具有重复值,我只从其中获取唯一值
t1 <- unique(timestamp)
我已经尝试了这个但是它花了很多时间编译并且结果是错误的
for(i in 125:length(t1)){ for(j in 1:203614){ s1[j,] <- subset(m1,(m1$Timestamp.Transaction <=t1[i] & m1$Timestamp.Transaction >= t1[i]-115200 ) }}
答案 0 :(得分:0)
# You should set timestamps as the vector of all "certain timestamps" and max.time.diff to "1.33 days"
# I assume there is a subtraction operator for posixct, which produces a number (check it!), if not, use as.double
# timestamps <- ...
# max.time.diff <- ...
len <- length(timestamps)
mins <- rep(NA, len)
maxs <- mins
means <- mins
for (i in seq(len)) {
timestamp <- timestamps[i]
prices <- m1$Price[abs(m1$Timestamp - timestamp) <= max.time.diff]
mins[i] <- min(prices)
maxs[i] <- max(prices)
means[i] <- mean(prices)
}
答案 1 :(得分:0)
您可以将子集放在带
的列表中newdf <- lapply(t1, function(x)
subset(dataset, dataset$Timestamp.Transaction <=x & dataset$Timestamp.Transaction >= x-115200))
然后获取所有子集的summary()
- 列的Price
列表
summaries <- lapply(newdf, function(x) summary(x["Price"]))
输出:
[[1]]
Price
Min. :18.0
1st Qu.:18.5
Median :19.0
Mean :19.0
3rd Qu.:19.5
Max. :20.0
[[2]]
Price
Min. :15.00
1st Qu.:15.00
Median :16.50
Mean :17.67
3rd Qu.:19.50
Max. :23.00
要命名摘要条目,只需使用
names(summaries) <- sapply(t1, function(x) paste(x-115200, x, sep = " - "))
新输出:
$`2013-12-30 07:30:00 - 2013-12-31 15:30:00`
Price
Min. :18.0
1st Qu.:18.5
Median :19.0
Mean :19.0
3rd Qu.:19.5
Max. :20.0
$`2013-12-30 08:00:00 - 2013-12-31 16:00:00`
Price
Min. :15.00
1st Qu.:15.00
Median :16.50
Mean :17.67
3rd Qu.:19.50
Max. :23.00
这应该比使用for()
- 循环更快。