我经常有一些来自某些计算的数据框,我想在输出之前进行清理,重命名和列排列。以下所有版本都可以使用,简单的data.frame
最接近。
有没有办法将within
和mutate
的内部数据框计算与data.frame()
的列顺序保存相结合,而无需额外的冗余[,....]最后?
library(plyr)
# Given this chaotically named data.frame
d = expand.grid(VISIT=as.factor(1:2),Biochem=letters[1:2],time=1:5,
subj=as.factor(1:3))
d$Value1 =round(rnorm(nrow(d)),2)
d$val2 = round(rnorm(nrow(d)),2)
# I would like to cleanup, compute and rearrange columns
# Simple and almost perfect
dDataframe = with(d, data.frame(
biochem = Biochem,
subj = subj,
visit = VISIT,
value1 = Value1*3
))
# This simple solution is almost perfect,
# but requires one more line
dDataframe$value2 = dDataframe$value1*d$val2
# For the following methods I have to reorder
# and select in a second step
# use mutate from plyr to allow computation on computed values,
# which transform cannot do.
dMutate = mutate(d,
biochem = Biochem,
subj = subj,
visit = VISIT,
value1 = Value1*3, #assume this is a time consuming function
value2 = value1*val2
# Could set fields = NULL here to remove,
# but this does not help getting column order
)[,c("biochem","subj","visit","value1","value2")]
# use within. Same problem, order not preserved
dWithin = within(d, {
biochem = Biochem
subj = subj
visit = VISIT
value1 = Value1*3
value2 = value1*val2
})[,c("biochem","subj","visit","value1","value2")]
all.equal(dDataframe,dWithin)
all.equal(dDataframe,dMutate)
答案 0 :(得分:2)
您可以使用summarize
包中的summarise
(或plyr
)。来自doc:
总结以一种类似的方式进行转换,除了不将列添加到现有数据框之外,它创建了一个新的数据框。 [...]
对于你的例子:
library(plyr)
summarize(d,
biochem = Biochem,
subj = subj,
visit = VISIT,
value1 = Value1 * 3,
value2 = value1 * val2
)
答案 1 :(得分:2)
如果您愿意转到data.table
,那么您可以通过引用执行(大多数)这些操作,并避免与[<-.data.frame
和$<-.data.frame
相关联的复制
setnames
将重命名data.table
。 setcolorder
将重新排序data.table
,:=
将通过引用分配。
library(data.table)
DT <- data.table(d)
# rename to lowercase only
setnames(DT, old = names(DT), new = tolower(names(DT))
# reassign using `:=`
# note the use of `value1<-value1` to allow later use.
# This will not be necessary once FR1492 has been implemented
# setting to NULL removes these columns
DT[, `:=`(value1 =value1<- value1*3,
value2 = value1 * val2,
val2 = NULL, time = NULL )]
setcolorder(DT, c("biochem","subj","visit","value1","value2"))
如果你不太关心内存效率,并希望使用data.table
语法,那么
DT <- data.table(d)
DT[,list( biochem = Biochem,
subj = subj,
visit = VISIT,
value1 = value1 <- Value1 * 3,
value2 = value1 * val2
)]
会工作。