嗨我有数据框
如何创建max_value列,其中max为最近2年的最大值
Name year value *max_value*
A 2012 22 NA
A 2012 99 NA
A 2013 12 99
A 2014 01 99
A 2015 23 12
A 2016 40 23
A 2017 12 40
B 2012 12 NA
B 2013 33 12
B 2013 40 12
B 2014 NA 40
B 2015 20 40
B 2016 20 20
提前致谢
答案 0 :(得分:0)
以下是使用聚合的data.table方法,二维shift
,apply
和连接。
library(data.table)
dt[dt[, .(mx=max(value)), by=c("Name", "year")
][, .(year,
max_val=apply(matrix(unlist(shift(mx, 1:2)), ncol=2), 1, max, na.rm=TRUE)),
by=Name],
on=c("Name", "year")][is.infinite(max_val), max_val := NA][]
第一行按年份和名称计算最大值。第二行,为每个名称,年份并使用apply
返回两个滞后年份的最大值(使用shift(mx, 1:2)
),删除NA值。这会导致每个具有2个NA值的行发出警告,并在该位置返回-Inf。我不得不手动将shift
的输出转换为矩阵,以便将其提供给应用,这是不理想的。生成的data.table使用name和year作为ID连接到原始数据。最后,-Inf值在最后一行用NA替换,结果用[]
打印。
返回
Name year value max_value max_val
1: A 2012 22 NA NA
2: A 2012 99 NA NA
3: A 2013 12 99 99
4: A 2014 1 99 99
5: A 2015 23 12 12
6: A 2016 40 23 23
7: A 2017 12 40 40
8: B 2012 12 NA NA
9: B 2013 33 12 12
10: B 2013 40 12 12
11: B 2014 NA 40 40
12: B 2015 20 40 40
13: B 2016 20 20 20
数据强>
dt <-
structure(list(Name = c("A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), year = c(2012L, 2012L, 2013L, 2014L,
2015L, 2016L, 2017L, 2012L, 2013L, 2013L, 2014L, 2015L, 2016L
), value = c(22L, 99L, 12L, 1L, 23L, 40L, 12L, 12L, 33L, 40L,
NA, 20L, 20L), max_value = c(NA, NA, 99L, 99L, 12L, 23L, 40L,
NA, 12L, 12L, 40L, 40L, 20L)), .Names = c("Name", "year", "value",
"max_value"), row.names = c(NA, -13L), class = c("data.table",
"data.frame"))
答案 1 :(得分:0)
这是使用mapply的基本R解决方案。
df <- data.frame(Name = c("A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"),
year = c(2012, 2012, 2013, 2014, 2015, 2016, 2017, 2012, 2013, 2013, 2014, 2015, 2016),
value = c(22, 99, 12, 1, 23, 40, 12, 12, 33, 40, NA, 20, 20),
stringsAsFactors = FALSE)
max.vals <- mapply(function(x, y){
vals <- df[df$year %in% c(x-2,x-1) & df$Name == y,"value"]
max.val <- ifelse(length(vals) > 0, max(vals, na.rm = TRUE), NA)
max.val <- list(y,x,max.val)
names(max.val) <- c("Name","year","max_value")
return(max.val)
},
unique(df[,c("Name","year")])$year,
unique(df[,c("Name","year")])$Name
)
max.vals <- as.data.frame(t(max.vals),stringsAsFactors = FALSE)
df <- merge(df, max.vals)
答案 2 :(得分:-1)
使用by
:
> by(dat$value, dat$year, function(x) max(x))
dat$year: 2012
[1] 99
------------------------------------------------------------
dat$year: 2013
[1] 40
------------------------------------------------------------
dat$year: 2014
[1] NA
------------------------------------------------------------
dat$year: 2015
[1] 23
------------------------------------------------------------
dat$year: 2016
[1] 40
------------------------------------------------------------
dat$year: 2017
[1] 12
编辑:一开始误解了这个问题。这应该是你想要的:
将结果分配给data.frame:
> dat1=by(dat$value, dat$year, function(x) max(x))
> data.frame("max"=dat1[1:length(dat1)])
max
2012 99
2013 40
2014 NA
2015 23
2016 40
2017 12
创建一个新数据框以保持两年最大值并循环以比较年份:
bi_max=data.frame("max"=nrow(dat_max))
for(i in 1:nrow(dat_max)){
bi_max[i,]=max(dat_max$max[i], dat_max$max[i-1], na.rm=T)
}
rownames(bi_max)=rownames(dat_max)
最终结果:
> bi_max
max
2012 99
2013 99
2014 40
2015 23
2016 40
2017 40