如何在R中获得前两年的最大值

时间:2017-07-05 12:52:06

标签: r

嗨我有数据框

如何创建max_value列,其中max为最近2年的最大值

Name   year   value   *max_value*
A      2012    22        NA
A      2012    99        NA
A      2013    12        99
A      2014    01        99
A      2015    23        12
A      2016    40        23
A      2017    12        40
B      2012    12        NA
B      2013    33        12
B      2013    40        12
B      2014    NA        40
B      2015    20        40
B      2016    20        20

提前致谢

3 个答案:

答案 0 :(得分:0)

以下是使用聚合的data.table方法,二维shiftapply和连接。

library(data.table)
dt[dt[, .(mx=max(value)), by=c("Name", "year")
      ][, .(year,
            max_val=apply(matrix(unlist(shift(mx, 1:2)), ncol=2), 1, max, na.rm=TRUE)),
        by=Name],
    on=c("Name", "year")][is.infinite(max_val), max_val := NA][]

第一行按年份和名称计算最大值。第二行,为每个名称,年份并使用apply返回两个滞后年份的最大值(使用shift(mx, 1:2)),删除NA值。这会导致每个具有2个NA值的行发出警告,并在该位置返回-Inf。我不得不手动将shift的输出转换为矩阵,以便将其提供给应用,这是不理想的。生成的data.table使用name和year作为ID连接到原始数据。最后,-Inf值在最后一行用NA替换,结果用[]打印。

返回

    Name year value max_value max_val
 1:    A 2012    22        NA      NA
 2:    A 2012    99        NA      NA
 3:    A 2013    12        99      99
 4:    A 2014     1        99      99
 5:    A 2015    23        12      12
 6:    A 2016    40        23      23
 7:    A 2017    12        40      40
 8:    B 2012    12        NA      NA
 9:    B 2013    33        12      12
10:    B 2013    40        12      12
11:    B 2014    NA        40      40
12:    B 2015    20        40      40
13:    B 2016    20        20      20

数据

dt <- 
structure(list(Name = c("A", "A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B"), year = c(2012L, 2012L, 2013L, 2014L, 
2015L, 2016L, 2017L, 2012L, 2013L, 2013L, 2014L, 2015L, 2016L
), value = c(22L, 99L, 12L, 1L, 23L, 40L, 12L, 12L, 33L, 40L, 
NA, 20L, 20L), max_value = c(NA, NA, 99L, 99L, 12L, 23L, 40L, 
NA, 12L, 12L, 40L, 40L, 20L)), .Names = c("Name", "year", "value", 
"max_value"), row.names = c(NA, -13L), class = c("data.table", 
"data.frame"))

答案 1 :(得分:0)

这是使用mapply的基本R解决方案。

df <- data.frame(Name = c("A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"),
                 year = c(2012, 2012, 2013, 2014, 2015, 2016, 2017, 2012, 2013, 2013, 2014, 2015, 2016),
                 value = c(22, 99, 12, 1, 23, 40, 12, 12, 33, 40, NA, 20, 20),
                 stringsAsFactors = FALSE)

max.vals <- mapply(function(x, y){
                     vals <- df[df$year %in% c(x-2,x-1) & df$Name == y,"value"]
                     max.val <- ifelse(length(vals) > 0, max(vals, na.rm = TRUE), NA)
                     max.val <- list(y,x,max.val)
                     names(max.val) <- c("Name","year","max_value")
                     return(max.val)
                   },
                  unique(df[,c("Name","year")])$year,
                  unique(df[,c("Name","year")])$Name
                 ) 

max.vals <- as.data.frame(t(max.vals),stringsAsFactors = FALSE)

df <- merge(df, max.vals)

答案 2 :(得分:-1)

使用by

> by(dat$value, dat$year, function(x) max(x))
dat$year: 2012
[1] 99
------------------------------------------------------------ 
dat$year: 2013
[1] 40
------------------------------------------------------------ 
dat$year: 2014
[1] NA
------------------------------------------------------------ 
dat$year: 2015
[1] 23
------------------------------------------------------------ 
dat$year: 2016
[1] 40
------------------------------------------------------------ 
dat$year: 2017
[1] 12
编辑:一开始误解了这个问题。这应该是你想要的:

将结果分配给data.frame:

> dat1=by(dat$value, dat$year, function(x) max(x))
> data.frame("max"=dat1[1:length(dat1)])
     max
2012  99
2013  40
2014  NA
2015  23
2016  40
2017  12

创建一个新数据框以保持两年最大值并循环以比较年份:

bi_max=data.frame("max"=nrow(dat_max))
for(i in 1:nrow(dat_max)){
  bi_max[i,]=max(dat_max$max[i], dat_max$max[i-1], na.rm=T)
}
rownames(bi_max)=rownames(dat_max)

最终结果:

> bi_max
     max
2012  99
2013  99
2014  40
2015  23
2016  40
2017  40