R在数据表中按循环添加列

时间:2015-02-03 22:00:32

标签: r for-loop data.table

我有一个这样的数据表:

DT <- data.table(ID=rep(c(1:2),each=6), year=rep(c(2003:2006),each=3), month=rep(c(5:8),3), day=rep(c(11:14),3),value=c(101:112))

我想添加条件:

1,添加5列名称:V100,V102,V105,V108,V112

2,在每列中,按ID和年份分组,总结小于列名中值的值,例如:对于列V112,将小于112的分组值相加

结果如下:

DT1 <- data.table(ID=rep(c(1:2),each=2), year=c(2003:2006), "100"=rep(0,4), "102"=c(2,0,0,0),"105"=c(3,2,0,0),"108"=c(3,3,2,0),"112"=rep(3,4))

我尝试过编写代码,但无法弄清楚:

degree <- c(100,102,105,108,112)    
 for (d in degree)
{  
   f_year <- function(d) {sum(DT$value <= d)}
   DT <- DT[,d:=f_year(),by=list(ID,year)]
}

任何帮助将不胜感激!

2 个答案:

答案 0 :(得分:3)

这可以用于lapply

degree <- c(100, 102, 105, 108, 112)  
myfun <- function(x,y) sum(y <= x)
DT1 <- DT[, lapply(degree, myfun, value), by = .(ID, year)]
setnames(DT1, c("ID", "year", as.character(degree)))

结果:

> DT1
   ID year 100 102 105 108 112
1:  1 2003   0   2   3   3   3
2:  1 2004   0   0   2   3   3
3:  2 2005   0   0   0   2   3
4:  2 2006   0   0   0   0   3

答案 1 :(得分:2)

另一种方式:

cols = c(100,102,105,108,112)
DT[, lapply(cols, function(x) sum(value <= x)), by=.(ID, year)]
#    ID year V1 V2 V3 V4 V5
# 1:  1 2003  0  2  3  3  3
# 2:  1 2004  0  0  2  3  3
# 3:  2 2005  0  0  0  2  3
# 4:  2 2006  0  0  0  0  3

然后你可以设置名称。

相反,如果您想直接设置名称,则可以先创建命名列表:

named_cols = setattr(as.list(cols), 'names', cols) 
DT[, lapply(named_cols, function(x) sum(value<=x)), by=.(ID, year)]
#    ID year 100 102 105 108 112
# 1:  1 2003   0   2   3   3   3
# 2:  1 2004   0   0   2   3   3
# 3:  2 2005   0   0   0   2   3
# 4:  2 2006   0   0   0   0   3