寻求在R中命名* many * new data.table列的快速或自动方法

时间:2014-03-12 07:50:58

标签: r dataset data.table

我有一个大型数据集,3000x400。我需要创建新列,这些列是由变量constituency子集化的现有列的​​方法。我有一个新列名列表,我想用它来命名新列,下面称为newNames。但是,当我直接输入所需的新名称时,我只能弄清楚如何命名列。

我目前的工作:

set.seed(1)
dataTest = data.table(turnout_avg = rnorm(20), urban_avg = rnorm(20,5,2), Constituency = c("A","B","C","D"), key = "Constituency")

oldColumnNames = c( "turnout_avg" , "urban_avg")

newNames = c( "turnout" ,   "urban")

# Here's my problem, naming these new columns
comm_means_by_district = cbind( 
dataTest[,list(Const_turnout = mean(na.omit(get(oldColumnNames[[1]])))), by= Constituency],
dataTest[,list(Const_urban = mean(na.omit(get(oldColumnNames[[2]])))),by= Constituency])

实际上,我想创建两个以上的新列。所以我无法为所有新列输入Const_turnoutConst_urban等。

我已尝试过两个想法,但都没有效果, 1.

dataTest[,list(paste("district", newNames[1], sep="_") = mean(na.omit(get(refColNames[[1]])))), by= Constituency]

或2.

dataTest[,list(paste(oldColumnNames[1], "constMean", sep="_") = mean(na.omit(get(refColNames[[1]])))), by= Constituency]

2 个答案:

答案 0 :(得分:5)

首先得到所有列的平均值

DT <- dataTest[,lapply(.SD,function(x) mean(na.omit(x))), by= Constituency]
然后

然后更改colnames

setnames(DT,colnames(DT),vector_of_newnames)

答案 1 :(得分:4)

为什么在应用函数的同一行中更改名称很重要?我只是首先计算选区方式并在之后设置列名。以下是这样的结果:

dt <- dataTest[, lapply(oldColumnNames, function(x) mean(na.omit(get(x)))), 
               by=Constituency]
setnames(dt, c("Constituency", paste("Const", newNames, sep="_")))
dt