添加最多10行的列

时间:2017-07-01 14:10:03

标签: r

我正在尝试向我的数据框添加一列,其中包含另一列(High)的下十行的最大值。在下面的示例中,第一行的最大值为92.83。我是使用R的新手,我遇到了一些问题。

Date_Time           High  Max_Next10
2014-06-30 08:35:00 92.55 92.83
2014-06-30 08:40:00 92.69 92.83
2014-06-30 08:45:00 92.63 92.83
2014-06-30 08:50:00 92.83 92.80
2014-06-30 08:55:00 92.80 92.76
2014-06-30 09:00:00 92.71 92.76
2014-06-30 09:05:00 92.76 92.72
2014-06-30 09:10:00 92.72 92.75
2014-06-30 09:15:00 92.70 92.75
2014-06-30 09:20:00 92.70 92.75
2014-06-30 09:25:00 92.70 92.75
2014-06-30 09:30:00 92.63 92.76
2014-06-30 09:35:00 92.63 92.76
2014-06-30 09:40:00 92.57 N/A
2014-06-30 09:45:00 92.59 N/A
2014-06-30 09:50:00 92.58 N/A
2014-06-30 09:55:00 92.72 N/A
2014-06-30 10:00:00 92.75 N/A
2014-06-30 10:05:00 92.69 N/A
2014-06-30 10:10:00 92.66 N/A
2014-06-30 10:15:00 92.75 N/A
2014-06-30 10:20:00 92.76 N/A
2014-06-30 10:25:00 92.72 N/A

4 个答案:

答案 0 :(得分:3)

有一个名为zoo的包和一个名为rollmax

的函数

一个简单的行就可以得到你的结果了。

df$Max_Next10=zoo::rollmax(df$High, 10, na.pad = TRUE,align='left')

> df
         Date_Time  High Max_Next10
1   6/30/2014 8:35 92.55      92.83
2   6/30/2014 8:40 92.69      92.83
3   6/30/2014 8:45 92.63      92.83
4   6/30/2014 8:50 92.83      92.83
5   6/30/2014 8:55 92.80      92.80
6   6/30/2014 9:00 92.71      92.76
7   6/30/2014 9:05 92.76      92.76
8   6/30/2014 9:10 92.72      92.72
9   6/30/2014 9:15 92.70      92.75
10  6/30/2014 9:20 92.70      92.75
11  6/30/2014 9:25 92.70      92.75
12  6/30/2014 9:30 92.63      92.75
13  6/30/2014 9:35 92.63      92.76
14  6/30/2014 9:40 92.57      92.76
15  6/30/2014 9:45 92.59         NA
16  6/30/2014 9:50 92.58         NA
17  6/30/2014 9:55 92.72         NA
18 6/30/2014 10:00 92.75         NA
19 6/30/2014 10:05 92.69         NA
20 6/30/2014 10:10 92.66         NA
21 6/30/2014 10:15 92.75         NA
22 6/30/2014 10:20 92.76         NA
23 6/30/2014 10:25 92.72         NA

答案 1 :(得分:1)

sapply的解决方案:

df$Max_Next10 <- sapply(seq_len(nrow(df)), function(i){
    if(i + 10 > nrow(df))
        NA
    else
        max(df$High[(i + 1):(i + 10)])
})

我开始使用的数据:

# > dput(df)
structure(list(Date_Time = c("2014-06-30 08:35:00", "2014-06-30 08:40:00", 
"2014-06-30 08:45:00", "2014-06-30 08:50:00", "2014-06-30 08:55:00", 
"2014-06-30 09:00:00", "2014-06-30 09:05:00", "2014-06-30 09:10:00", 
"2014-06-30 09:15:00", "2014-06-30 09:20:00", "2014-06-30 09:25:00", 
"2014-06-30 09:30:00", "2014-06-30 09:35:00", "2014-06-30 09:40:00", 
"2014-06-30 09:45:00", "2014-06-30 09:50:00", "2014-06-30 09:55:00", 
"2014-06-30 10:00:00", "2014-06-30 10:05:00", "2014-06-30 10:10:00", 
"2014-06-30 10:15:00", "2014-06-30 10:20:00", "2014-06-30 10:25:00"
), High = c(92.55, 92.69, 92.63, 92.83, 92.8, 92.71, 92.76, 92.72, 
92.7, 92.7, 92.7, 92.63, 92.63, 92.57, 92.59, 92.58, 92.72, 92.75, 
92.69, 92.66, 92.75, 92.76, 92.72)), .Names = c("Date_Time", 
"High"), row.names = c(NA, -23L), class = "data.frame")

答案 2 :(得分:1)

您可以创建一个将数据框和列名作为参数的函数,并为每行计算引用列的下10行的最大值:

mk.next10 <- function (data, col) {
  count <- 10
  c(
    sapply(1:(nrow(data) - count), function(i) max(data[(i+1):(i+1+count),col], na.rm=T)),
    rep(NA, count)
  )
}

有了这个,您可以为数据框创建列:

data$Max_Next10 <- mk.next10(data, 'High') 

答案 3 :(得分:0)

在下面的代码中,我们使用的数据框名为test。根据您的情况相应更改。

# Initialise
rm(list = ls())
library(data.table)
library(plyr)

# Load/Create data
test <- data.frame(value=c(300,100,200,50,100,80,100,700,500,300,250,510,100,620,910))

# Add index
test$id <- seq.int(nrow(test))

# Count number of rows
n <- nrow(test)

# Loop to create variable with Max
for(i in 1:n) {
  test_i <- subset(test,id>=i & id < i+10)
  max_test_i <- max(test_i$value)
  setDT(test)[i, Max:= max_test_i]
}

输出结果为:

value   id  Max
300 1   700
100 2   700
200 3   700
50  4   700
100 5   700
80  6   910
100 7   910
700 8   910
500 9   910
300 10  910
250 11  910
510 12  910
100 13  910
620 14  910
910 15  910