任何R专家都可以提供更快捷的方式来执行以下操作吗?我的代码有效,但需要1分钟才能完成30,000- [列]乘12- [行]数据帧。谢谢!
sync.columns = function(old.data, new.colnames)
{
# Given a data frame and a vector of column names,
# makes a new data frame containing exactly the named
# columns in the specified order; any that were not
# present are filled in as columns of zeroes.
if (length(new.colnames) == ncol(old.data) &&
all(new.colnames == colnames(old.data)))
{
old.data # nothing to do
}
else
{
m = matrix(nrow=nrow(old.data),ncol=length(new.colnames))
for (t in 1:length(new.colnames))
{
if (new.colnames[t] %in% colnames(old.data))
{
m[,t] = old.data[,new.colnames[t]] # copy column
}
else
{
m[,t] = rep(0,nrow(m)) # fill with zeroes
}
}
result = as.data.frame(m)
rownames(result) = rownames(old.data)
colnames(result) = new.colnames
result
}
}
也许有cbind的东西?
答案 0 :(得分:3)
这似乎相当快。首先创建一个充满零的data.frame,然后只替换旧数据中可以找到的内容:
sync.columns <- function(old.data, new.colnames) {
M <- nrow(old.data)
N <- length(new.colnames)
rn <- rownames(old.data)
cn <- new.colnames
new.data <- as.data.frame(matrix(0, M, N, dimnames = list(rn, cn)))
keep.col <- intersect(cn, colnames(old.data))
new.data[keep.col] <- old.data[keep.col]
new.data
}
M <- 30000
x <- data.frame(b = runif(M), i = runif(M), z = runif(M))
rownames(x) <- paste0("z", 1:M)
system.time(y <- sync.columns(x, letters[1:12]))
# user system elapsed
# 0.031 0.010 0.043
head(y)
# a b c d e f g h i j k l
# z1 0 0.27994248 0 0 0 0 0 0 0.3785181 0 0 0
# z2 0 0.75291520 0 0 0 0 0 0 0.7414294 0 0 0
# z3 0 0.07036461 0 0 0 0 0 0 0.1543653 0 0 0
# z4 0 0.40748957 0 0 0 0 0 0 0.5564374 0 0 0
# z5 0 0.98769595 0 0 0 0 0 0 0.4277466 0 0 0
# z6 0 0.82117781 0 0 0 0 0 0 0.2034743 0 0 0
编辑:在下面的OP注释后,这是一个矩阵版本:
sync.columns <- function(old.data, new.colnames) {
M <- nrow(old.data)
N <- length(new.colnames)
rn <- rownames(old.data)
cn <- new.colnames
new.data <- matrix(0, M, N, dimnames = list(rn, cn))
keep.col <- intersect(cn, colnames(old.data))
new.data[, keep.col] <- old.data[, keep.col]
new.data
}
x <- t(as.matrix(x)) # a wide matrix
system.time(y <- sync.columns(x, paste0("z", sample(1:50000, 30000))))
# user system elapsed
# 0.049 0.002 0.051