我有一个大型数据集,3000x400。我需要创建新列,这些列是由变量constituency
子集化的现有列的方法。我有一个新列名列表,我想用它来命名新列,下面称为newNames
。但是,当我直接输入所需的新名称时,我只能弄清楚如何命名列。
我目前的工作:
set.seed(1)
dataTest = data.table(turnout_avg = rnorm(20), urban_avg = rnorm(20,5,2), Constituency = c("A","B","C","D"), key = "Constituency")
oldColumnNames = c( "turnout_avg" , "urban_avg")
newNames = c( "turnout" , "urban")
# Here's my problem, naming these new columns
comm_means_by_district = cbind(
dataTest[,list(Const_turnout = mean(na.omit(get(oldColumnNames[[1]])))), by= Constituency],
dataTest[,list(Const_urban = mean(na.omit(get(oldColumnNames[[2]])))),by= Constituency])
实际上,我想创建两个以上的新列。所以我无法为所有新列输入Const_turnout
,Const_urban
等。
我已尝试过两个想法,但都没有效果, 1.
dataTest[,list(paste("district", newNames[1], sep="_") = mean(na.omit(get(refColNames[[1]])))), by= Constituency]
或2.
dataTest[,list(paste(oldColumnNames[1], "constMean", sep="_") = mean(na.omit(get(refColNames[[1]])))), by= Constituency]
答案 0 :(得分:5)
首先得到所有列的平均值
DT <- dataTest[,lapply(.SD,function(x) mean(na.omit(x))), by= Constituency]
然后然后更改colnames
setnames(DT,colnames(DT),vector_of_newnames)
答案 1 :(得分:4)
为什么在应用函数的同一行中更改名称很重要?我只是首先计算选区方式并在之后设置列名。以下是这样的结果:
dt <- dataTest[, lapply(oldColumnNames, function(x) mean(na.omit(get(x)))),
by=Constituency]
setnames(dt, c("Constituency", paste("Const", newNames, sep="_")))
dt