我正在尝试向我的数据框添加一列,其中包含另一列(High)的下十行的最大值。在下面的示例中,第一行的最大值为92.83。我是使用R的新手,我遇到了一些问题。
Date_Time High Max_Next10
2014-06-30 08:35:00 92.55 92.83
2014-06-30 08:40:00 92.69 92.83
2014-06-30 08:45:00 92.63 92.83
2014-06-30 08:50:00 92.83 92.80
2014-06-30 08:55:00 92.80 92.76
2014-06-30 09:00:00 92.71 92.76
2014-06-30 09:05:00 92.76 92.72
2014-06-30 09:10:00 92.72 92.75
2014-06-30 09:15:00 92.70 92.75
2014-06-30 09:20:00 92.70 92.75
2014-06-30 09:25:00 92.70 92.75
2014-06-30 09:30:00 92.63 92.76
2014-06-30 09:35:00 92.63 92.76
2014-06-30 09:40:00 92.57 N/A
2014-06-30 09:45:00 92.59 N/A
2014-06-30 09:50:00 92.58 N/A
2014-06-30 09:55:00 92.72 N/A
2014-06-30 10:00:00 92.75 N/A
2014-06-30 10:05:00 92.69 N/A
2014-06-30 10:10:00 92.66 N/A
2014-06-30 10:15:00 92.75 N/A
2014-06-30 10:20:00 92.76 N/A
2014-06-30 10:25:00 92.72 N/A
答案 0 :(得分:3)
有一个名为zoo
的包和一个名为rollmax
一个简单的行就可以得到你的结果了。
df$Max_Next10=zoo::rollmax(df$High, 10, na.pad = TRUE,align='left')
> df
Date_Time High Max_Next10
1 6/30/2014 8:35 92.55 92.83
2 6/30/2014 8:40 92.69 92.83
3 6/30/2014 8:45 92.63 92.83
4 6/30/2014 8:50 92.83 92.83
5 6/30/2014 8:55 92.80 92.80
6 6/30/2014 9:00 92.71 92.76
7 6/30/2014 9:05 92.76 92.76
8 6/30/2014 9:10 92.72 92.72
9 6/30/2014 9:15 92.70 92.75
10 6/30/2014 9:20 92.70 92.75
11 6/30/2014 9:25 92.70 92.75
12 6/30/2014 9:30 92.63 92.75
13 6/30/2014 9:35 92.63 92.76
14 6/30/2014 9:40 92.57 92.76
15 6/30/2014 9:45 92.59 NA
16 6/30/2014 9:50 92.58 NA
17 6/30/2014 9:55 92.72 NA
18 6/30/2014 10:00 92.75 NA
19 6/30/2014 10:05 92.69 NA
20 6/30/2014 10:10 92.66 NA
21 6/30/2014 10:15 92.75 NA
22 6/30/2014 10:20 92.76 NA
23 6/30/2014 10:25 92.72 NA
答案 1 :(得分:1)
sapply
的解决方案:
df$Max_Next10 <- sapply(seq_len(nrow(df)), function(i){
if(i + 10 > nrow(df))
NA
else
max(df$High[(i + 1):(i + 10)])
})
我开始使用的数据:
# > dput(df)
structure(list(Date_Time = c("2014-06-30 08:35:00", "2014-06-30 08:40:00",
"2014-06-30 08:45:00", "2014-06-30 08:50:00", "2014-06-30 08:55:00",
"2014-06-30 09:00:00", "2014-06-30 09:05:00", "2014-06-30 09:10:00",
"2014-06-30 09:15:00", "2014-06-30 09:20:00", "2014-06-30 09:25:00",
"2014-06-30 09:30:00", "2014-06-30 09:35:00", "2014-06-30 09:40:00",
"2014-06-30 09:45:00", "2014-06-30 09:50:00", "2014-06-30 09:55:00",
"2014-06-30 10:00:00", "2014-06-30 10:05:00", "2014-06-30 10:10:00",
"2014-06-30 10:15:00", "2014-06-30 10:20:00", "2014-06-30 10:25:00"
), High = c(92.55, 92.69, 92.63, 92.83, 92.8, 92.71, 92.76, 92.72,
92.7, 92.7, 92.7, 92.63, 92.63, 92.57, 92.59, 92.58, 92.72, 92.75,
92.69, 92.66, 92.75, 92.76, 92.72)), .Names = c("Date_Time",
"High"), row.names = c(NA, -23L), class = "data.frame")
答案 2 :(得分:1)
您可以创建一个将数据框和列名作为参数的函数,并为每行计算引用列的下10行的最大值:
mk.next10 <- function (data, col) {
count <- 10
c(
sapply(1:(nrow(data) - count), function(i) max(data[(i+1):(i+1+count),col], na.rm=T)),
rep(NA, count)
)
}
有了这个,您可以为数据框创建列:
data$Max_Next10 <- mk.next10(data, 'High')
答案 3 :(得分:0)
在下面的代码中,我们使用的数据框名为test
。根据您的情况相应更改。
# Initialise
rm(list = ls())
library(data.table)
library(plyr)
# Load/Create data
test <- data.frame(value=c(300,100,200,50,100,80,100,700,500,300,250,510,100,620,910))
# Add index
test$id <- seq.int(nrow(test))
# Count number of rows
n <- nrow(test)
# Loop to create variable with Max
for(i in 1:n) {
test_i <- subset(test,id>=i & id < i+10)
max_test_i <- max(test_i$value)
setDT(test)[i, Max:= max_test_i]
}
输出结果为:
value id Max
300 1 700
100 2 700
200 3 700
50 4 700
100 5 700
80 6 910
100 7 910
700 8 910
500 9 910
300 10 910
250 11 910
510 12 910
100 13 910
620 14 910
910 15 910