Question

我有一个这样的数据表：

DT <- data.table(ID=rep(c(1:2),each=6), year=rep(c(2003:2006),each=3), month=rep(c(5:8),3), day=rep(c(11:14),3),value=c(101:112))

我想添加条件：

1，添加5列名称：V100，V102，V105，V108，V112

2，在每列中，按ID和年份分组，总结小于列名中值的值，例如：对于列V112，将小于112的分组值相加

结果如下：

DT1 <- data.table(ID=rep(c(1:2),each=2), year=c(2003:2006), "100"=rep(0,4), "102"=c(2,0,0,0),"105"=c(3,2,0,0),"108"=c(3,3,2,0),"112"=rep(3,4))

我尝试过编写代码，但无法弄清楚：

degree <- c(100,102,105,108,112)    
 for (d in degree)
{  
   f_year <- function(d) {sum(DT$value <= d)}
   DT <- DT[,d:=f_year(),by=list(ID,year)]
}

任何帮助将不胜感激！

Answer 1

这可以用于lapply。

degree <- c(100, 102, 105, 108, 112)  
myfun <- function(x,y) sum(y <= x)
DT1 <- DT[, lapply(degree, myfun, value), by = .(ID, year)]
setnames(DT1, c("ID", "year", as.character(degree)))

结果：

> DT1
   ID year 100 102 105 108 112
1:  1 2003   0   2   3   3   3
2:  1 2004   0   0   2   3   3
3:  2 2005   0   0   0   2   3
4:  2 2006   0   0   0   0   3

Answer 2

另一种方式：

cols = c(100,102,105,108,112)
DT[, lapply(cols, function(x) sum(value <= x)), by=.(ID, year)]
#    ID year V1 V2 V3 V4 V5
# 1:  1 2003  0  2  3  3  3
# 2:  1 2004  0  0  2  3  3
# 3:  2 2005  0  0  0  2  3
# 4:  2 2006  0  0  0  0  3

然后你可以设置名称。

相反，如果您想直接设置名称，则可以先创建命名列表：

named_cols = setattr(as.list(cols), 'names', cols) 
DT[, lapply(named_cols, function(x) sum(value<=x)), by=.(ID, year)]
#    ID year 100 102 105 108 112
# 1:  1 2003   0   2   3   3   3
# 2:  1 2004   0   0   2   3   3
# 3:  2 2005   0   0   0   2   3
# 4:  2 2006   0   0   0   0   3

R在数据表中按循环添加列

2 个答案: