是否有一种快速而聪明的方式,就像DF这样说道
vec <- data.frame(Names = c("var1","var2","var3","var4","var5","var6","var7",
"var8","var9","var10","var11","var12","var13",
"var14") ,
phase1= runif(14),
phase1.away= runif(14),
phase1_in= runif(14),
phase1_out= runif(14),
phase1.1= runif(14),
phase1.away.1= runif(14),
phase1_in.1= runif(14),
phase1_out.1= runif(14),
phase1.2= runif(14),
phase1.away.2= runif(14),
phase1_in.2= runif(14),
phase1_out.2= runif(14))
给出一个新的DF:
- 根据phase1.x进行排序,给出与值相对应的变量名称,phase1_in和phase1_out值,但不包括phase1.away。
我正在做的只是
vec.o<-vec[with(vec, order(-phase1)),]
d1<-vec.o[c("Names","phase1","phase1_in","phase1_out")]
vec.o<-vec[with(vec, order(-phase1.1)),]
d2<-vec.o[c("Names","phase1.1","phase1_in.1","phase1_out.1")]
cbind(d1,d2)
这非常无聊,我也确信反R-ish。任何聪明的想法?我正在永久处理大型数据帧,R似乎是 有点累赘。有没有人会为这些目的推荐的好文献? (加载许多变量,为它们创建名称,使用这些变量进行操作等等),
答案 0 :(得分:3)
编辑:针对案例阶段进行了更正.x进入了10及更高版本。
我认为你有比phase1.1,phase1.2更多的东西,所以使用正则表达式的一般解决方案将是这样的:
# Make an id vector for the phase1.x, and give Names id -1
# gives a warning as character is transformed to NA
id <- as.numeric(gsub(".*\\.(\\d+$)","\\1",names(vec)))
id[1] <- -1
id[is.na(id)] <- 0 # first occurence, no .x
veclist <- lapply(unique(id)[-1],function(i){
#select all variables necessary, exclude the away
out <- vec[id %in% c(i,-1) & !grepl("away",names(vec))]
# find the phase1.x for ordering
ovec <- grepl("phase1(\\.\\d+)?$",names(out))
# order and produce
out[order(out[,ovec]),]
})
do.call(cbind,veclist)
它基于对前面带有点的最后一个数字的识别,并将其删除。如果没有以点开头的最后一个数字,则它是Names变量(我用-1表示)或第一个阶段(我用0表示)。
现在你有一个id向量,可以轻松选择属于一起的变量,因此你可以循环id的唯一值,除了第一个(为-1)。再次使用正则表达式,您可以获得构建新数据帧所需的任何变量。最后的do.call
再次组合了所有这些数据帧。
顺便说一句,订购子数据帧比先订购原始数据帧然后选择变量要快得多。这是你在nullglob解决方案中获得的收益。
答案 1 :(得分:1)
这不是特别聪明,但速度快了两倍(根据我的简单基准):
o1 <- order(-vec$phase1)
o2 <- order(-vec$phase1.1)
cbind(vec[o1,c("Names","phase1","phase1_in","phase1_out")],
vec[o2,c("Names","phase1.1","phase1_in.1","phase1_out.1")])
基准点在这里:
> n <- 2e5
> vec<-data.frame(Names = as.character(runif(n)) ,
+ phase1= runif(n),
+ phase1.away= runif(n),
+ phase1_in= runif(n),
+ phase1_out= runif(n),
+ phase1.1= runif(n),
+ phase1.away.1= runif(n),
+ phase1_in.1= runif(n),
+ phase1_out.1= runif(n),
+ phase1.2= runif(n),
+ phase1.away.2= runif(n),
+ phase1_in.2= runif(n),
+ phase1_out.2= runif(n))
>
>
> test1 <- function(){
+ vec.o<-vec[with(vec, order(-phase1)),]
+ d1<-vec.o[c("Names","phase1","phase1_in","phase1_out")]
+ vec.o<-vec[with(vec, order(-phase1.1)),]
+ d2<-vec.o[c("Names","phase1.1","phase1_in.1","phase1_out.1")]
+ d3 <- cbind(d1,d2)
+ }
> system.time(test1())
user system elapsed
1.764 0.048 1.811
>
>
> test2 <- function(){
+ o1 <- order(-vec$phase1)
+ o2 <- order(-vec$phase1.1)
+ d4 <- cbind(vec[o1,c("Names","phase1","phase1_in","phase1_out")],
+ vec[o2,c("Names","phase1.1","phase1_in.1","phase1_out.1")])
+ }
> system.time(test2())
user system elapsed
0.736 0.056 0.791